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Abstract 

Keeping the local times of processes in a distributed system synchronized in the presence of 
arbitrary faults is important in many applications and is an interesting theoretical problem in its 
own right. In order to be practical, any algorithm to synchronize clocks must be able to deal with 
process failures and repairs, clock drift, and varying message delivery times, but these conditions 
complicate the design and analysis of algorithms. In this thesis, a general formal model to 
describe a system of distributed processes, each of which has its own clock, is presented. The 
processes communicate by sending messages to each other, and they can set timers to cause 
themselves to take steps at some future times. It is proved that even if the clocks run at a perfect 
rate and there are no failures, an uncertainty of e in the known message delivery time makes it 
impossible to synchronize the clocks of n processes any more closely than 2«(1 - 1/n). A simple 
algorithm that achieves this bound is given to show that the lower bound is tight. 

Two fault-tolerant algorithms are presented and analyzed, one to maintain synchronization 
among processes whose clocks initially are close together, and another to establish 
synchronization in the first place. Both handle drift in the clock rates, uncertainty in the message 
delivery time, and arbitrary failure of just under one third of the processes. The maintenance 
algorithm can be modified to allow a failed process that has been repaired to be reintegrated into 
the system. A variant of the maintenance algorithm is used to establish the initial synchronization. 
It was also necessary to design an interface between the two algorithms since we envision the 
processes running the start-up algorithm until the desired degree of synchronization is obtained, 
and then switching to the maintenance algorithm. 
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Chapter One 
Introduction 



1.1 The Problem 

Keeping the local times of processes in a distributed system synchronized in the presence of 
arbitrary faults is important in many applications and is an interesting problem in its own right. In 
order to be practical, any algorithm to synchronize clocks must be able to deal with process 
failures and repairs, clock drift, and varying message delivery times, but these conditions 
complicate the design and analysis of algorithms. 

In this thesis we describe a formal model for a system of distributed processes with clocks, and 
demonstrate a lower bound on how closely the clocks can be synchronized, even when strong 
assumptions are made about the behavior of the system. Then we describe and analyze 
algorithms to establish and maintain synchronization under more realistic assumptions. 

We assume a collection of processes that communicate by sending messages over a reliable 
medium. Each process has a physical clock, not under its control, that is incremented in some 
relationship with real time. By adding the value of a local variable to the value of the physical 
clock, the process obtains its local time. 

The design of a clock synchronization algorithm must take into account the following factors. 

1 . The uncertainty in the message delivery time. Messages are assumed in this thesis to 
be delivered a fixed amount of time after they are sent, plus or minus some 
uncertainty. 

2. Clock drift. Are the processes' clock rates fast or slow relative to real time? If the 
clocks drift, then the synchronization procedure must be repeated periodically to 
keep the clocks synchronized. 

3. Are the clocks initially synchronized? If they are, then the problem of synchronizing 
the clocks is already solved unless the clocks drift, since once nondrifting clocks are 
synchronized, they stay synchronized. 

4. Fault tolerance. What kinds of faults (if any) are tolerated? This thesis does not 
consider communication link failures. A certain proportion of the processes, 
however, may be faulty in the worst possible way, by sending arbitrary messages at 
arbitrary times. 
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5. Digital signatures. Can a faulty process forge a message from another process? If 
digital signatures are available, then process p can tell process q that it received a 
message x from process r, only if such was actually the case. This obviously reduces 
the power of a faulty process to create havoc. Some of the other clock 
synchronization algorithms in the literature [5, 7] need this capability, but ours do not. 

6. Reintegration. In order to be practical, a synchronization algorithm must allow faulty 
processes that have recovered to be reintegrated into the system. 

7. Size of the adjustment. Particularly when the synchronization procedure is 
performed periodically, the amount by which the clock is changed should not be too 
big. 



1 .2 Results of the Thesis 

1.2.1 Model 

One of the contributions of this thesis is a precise formal model of a system of distributed 
processes, each of which has its own clock. Within the model, lower bound proofs can be seen to 
be rigorous, and the effects of algorithms, once they are stated in a language that maps to the 
model, can be discerned unambiguously. The model is described in Chapter 2. 

We model the situation in which each process has a physical clock that is not under its control. 
By adding some value to the physical clock time a process obtains a local time. A process can set 
a timer to go off at a specified time in the future. Formally, timers are treated similarly to 
messages between processes. The system is interrupt-driven in that a process only takes a step 
when a message arrives. The message may come from another process, or it may be a timer that 
was set by the process itself. Thus, by using a timer, a process can ensure that an interrupt will 
occur at a specified time in the future. 

A process is modelled as an automaton, with states and a transition function. One of the 
arguments to the transition function is a real number, representing the time on the process' clock. 
Clocks are modelled as real-valued functions from real time to clock time. We assume that the 
communication network is fully connected, so that every process can send a message directly to 
every other process. Processes possess the capability of broadcasting a message to ail the 
processes at the same time. The message system is described as a buffer that holds messages 
until they are delivered. AH messages are delivered within a fixed amount of time plus or minus 
some uncertainty. The delivery of a message at a process is the only type of event we consider. A 
system execution consists of sequences of "actions", each of which is a process event 



surrounded by a description of the state of the system, one sequence for each real time of 
interest. The sequences must satisfy certain natural consistency and correctness conditions. 

1.2.2 Lower Bound 

Even if the simplifying assumptions are made that clocks run at a perfect rate and that there are 
no failures, the presence of an uncertainty of e in the message delivery time alone prevents any 
algorithm from exactly synchronizing clocks that initially have arbitrary values. We show in 
Chapter 3 that 2e(1 - 1/n) is a lower bound on how closely the clocks of n processes can be 
synchronized in this case. Of course, in this case, any algorithm which synchronizes the clocks 
once causes them to remain synchronized. However, since these are strong assumptions, this 
lower bound also holds for the more realistic case in which clocks do drift and arbitrary faults 
occur. Just to show that this bound is tight, we describe an algorithm that achieves this bound for 
the simplified case. 

1.2. 3 Maintaining Synchronization 

We describe a synchronization algorithm in Chapter 4 that handles dock drift, uncertainty in the 
message delivery time and arbitrary process faults. The algorithm requires the clocks to be 
initially close together and less than one third of the processes to be faulty. 

Our algorithm runs in rounds, resynchronizing every so often to correct for the clocks drifting out 
of synchrony, and using a fault-tolerant averaging function based on those in [1] to calculate an 
adjustment. The size of the adjustment made to a clock at each round is independent of the 
number of faulty processes. At each round, n 2 messages are required, where n is the total 
number of processes. The closeness of synchronization achieved depends only on the initial 
closeness of synchronization, the message delivery time and its uncertainty, and the drift rate. 
Since the closeness of synchronization depends on the initial closeness, this is, in the terminology 
of [7], an interactive convergence algorithm. We give explicit bounds on how the difference 
between the clock values and real time grows. The algorithm can be easily adapted to become a 
reintegration procedure for repaired processes. 

At the beginning of each round, every nonfaulty process broadcasts its clock value and then waits 
a bounded amount of time, measured on its logical clock, long enough to ensure that clock values 
are received from all nonfaulty processes. After waiting, the process averages the arrival times of 
all the messages received, using a particular fault-tolerant averaging function. The resulting 
average is used to calculate an adjustment to the process' clock. 
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The fault-tolerant averaging function is derived from those used in [1] for reaching approximate 
agreement. The function is designed to be immune to some fixed maximum number, f, of faults. It 
first throws out the f highest and f lowest values, and then applies some ordinary averaging 
function to the remaining values. We choose the midpoint of the range of the remaining values, to 
be specific. The properties of the fault-tolerant averaging function allow the distance between the 
clocks to be halved, in a rough sense, at each round. Consequently, the averaging function can 
be considered the heart of the algorithm. 

This algorithm can maintain a closeness of synchronization of approximately 4c, where c is the 
uncertainty in the message delivery time. 

1 .2.4 Establishing Synchronization 

The problem solved by the algorithm in Chapter 4 is only that of maintaining synchronization of 
local times once ft has been established. There is, of course, the separate problem of establishing 
such synchronization in the first place among processes whose clocks have arbitrary values. A 
variant of the maintenance algorithm can be used to establish the initial synchronization as well 
and is described in Chapter 5. The algorithm handles arbitrary failures of the processes, 
uncertainty in the message delivery time, and clock drift. It was also necessary to design an 
interface between the two algorithms since we envision the processes running this algorithm until 
the desired degree of synchronization is obtained, and then switching to the maintenance 
algorithm. 

The structure of the algorithm is similar to that of the algorithm which maintains synchronization. 
It runs in rounds. During each round, the processes exchange clock values and use the same 
fault-tolerant averaging function as before to calculate the corrections to their clocks. However, 
each round contains an additional phase, in which the processes exchange messages to decide 
that they are ready to begin the next round. 

This algorithm also synchronizes the clocks to within about 4c. Again, the fault-tolerant averaging 
function used in the algorithm causes the difference in the clocks to be cut in half at each round. 
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1.3 Related Work 

The problem of synchronizing clocks has been a topic of interest recently. A seminal paper was 
Lamport's work [6], defining logical clocks and describing an algorithm to synchronize them. 
Several algorithms to synchronize real time clocks have appeared in the literature [5, 6, 7, 9]. 
Those of Lamport [6] and Marzullo [9] have the processes updating their clocks whenever they 
receive an appropriate message; these messages are assumed to arrive every so many real 
seconds, or more often. In contrast, the algorithms in Halpern, Simons and Strong [5], Lamport 
and Melliar-Smith [7], and this thesis run in rounds. During a round, a process updates its clock 
once. The rounds are determined by the times at which different processes' local clocks reach 
the same times. There is an impossibility result due to Dotev, Halpern and Strong [2], showing 
that it is impossible to synchronize clocks without digital signatures if one third or more of the 
processes are subject to Byzantine failures. Dolev, Halpern and Strong's paper [2] also contains 
a lower bound similar to ours (proved independently), but characterizing the closeness of 
synchronization obtainable along the real time axis, that is, a lower bound on how closely in real 
time two processes' clocks can read the same value. 

The three algorithms of Lamport and Melliar-Smith [7], as well as our maintenance algorithm, 
require a reliable, completely connected communication network, and handle arbitrary process 
faults. The first algorithm works by having each process at every round read all the other 
processes' clocks and set its clock to the average of those values that aren't too different from its 
own. The size of the adjustment is no more than the amount by which the clocks differ plus the 
uncertainty in obtaining the other processes' clock values. However, the closeness of the 
synchronization achieved depends on the total number of processes, n. The message complexity 
is n 2 at each round, if getting another process' dock value is equated with sending a message. 

In the other two algorithms in [7], each process sets its clock to the median of the values obtained 
by receiving messages from the other processes. To make sure each nonfaulty process has the 
same set of values, the processes execute a Byzantine Agreement protocol on the values. The 
two algorithms use different Byzantine Agreement protocols. One of the protocols doesn't 
require digital signatures, whereas the other one does. As a result, the clock synchronization 
algorithm derived from the latter will work even if almost one half of the processes are faulty, while 
the other two algorithms in [7] can only handle toss ttian one third faulty processes. For both of 
the Byzantine clock synchronization algorithms, the closeness of synchronization and the size of 
the adjustment depend on the number of faulty processes, and the number of messages per 
round is exponential in the number of faults. 
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The algorithm of Halpern, Simons and Strong [5] works In the presence of any number of process 
and link failures as long as the nonfaulty processes can still communicate. It requires digital 
signatures. When a process' clock reaches a certain value (decided on in advance), it broadcasts 
that time. If it receives a message containing the value not too long before it reaches the value, it 
updates its clock to the value and relays the message. The closeness of synchronization depends 
only on the drift rate, the round length, the message delivery time, and the diameter of the 
communication graph after the faulty elements are removed. The message complexity per round 
is n 2 . However, the size of the adjustment depends on the number of faulty processes. 

The framework and error model used by Marzullo in [9] make a direct comparison of his results 
with ours difficult. He considers intervals of time and analyzes the error probabilistically. 

The problem addressed in these papers is only that of maintaining synchronization of local times 
once it has been established. None of them explicitly discusses any sort of validity condition, 
quantifying how clock time increases in relation to real time. Only [5] includes a reintegration 
procedure for repaired processes. 
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Chapter Two 
Formal Model 



2.1 Introduction 

We present a formal model for describing a system of distributed processes, each of which has its 
own clock. The processes communicate by sending messages to each other, and they can set 
timers to cause themselves to take steps at some specified future times. The model is designed to 
handle arbitrary clock rates, Byzantine process failures, and a variety of assumptions about the 
behavior of the message system. 

The advantages of a formal model are that lower bound proofs can be seen to be rigorous, and 
the effects of an algorithm, once it is stated in a language that maps to the model, can be 
discerned unambiguously. 

This model will be used in subsequent chapters to describe our particular versions of the clock 
synchronization problem. 



2.2 Informal Description 

We model a distributed system consisting of a set of processes that communicate by sending 
messages to each other. Each process has a physical clock that is not under its control. 

A typical message consists of text and the sending process' name. There are also two special 
messages, START, which comes from an external source and indicates that the recipient should 
begin the algorithm, and TIMER, which a process receives when its physical clock has reached a 
designated time. 

A process is modelled as an automaton with a set of states and a transition function. The 
transition function describes the new state the process enters, the messages it sends out, and the 
timers it sets for itself, all as a function of the process' current state, received message and 
physical dock time. An application of the transition function constitutes a process step, the only 
kind of event in our model. 
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The system is interrupt-driven in that a process only takes a step when a message arrives. The 
message may come from another process, or it may be a TIMER message that was sent by the 
process itself. Thus, by using a TIMER message, a process can ensure that an interrupt will occur 
at a specified time in the future. We neglect local processing time by assuming that the 
processing of an arriving message is instantaneous. 

We assume that the communication network is fully connected, so that every process can send a 
message directly to every other process. Processes possess the capability of broadcasting a 
message to all the processes at one step. The message system is described as a buffer that holds 
messages until they are delivered. 

System histories consist of sequences of "actions", each of which is a process event surrounded 
by a description of the state of the system, one sequence for each real time of interest. The 
sequences must satisfy certain natural consistency and correctness conditions. We introduce the 
notion of "shifting" the real times at which a particular process' steps occur in a history and note 
the resulting changes to the message delivery times. Finally, we define an execution to be a 
history in which the message system behaves as desired. 



2.3 Systems of Processes 

Let P be a fixed set of process names. Let X be a fixed set of message values. Then M, the set of 
messages, is {START, TIMER} U (X x P). A process receives a START message as an external 
indication of the beginning of an algorithm. A process receives a TIMER message when a 
specified time has been reached on its physical clock. Ail other messages consist of a message 
value and a process name, indicating the sender of the message. 

Let 9(S) denote the finite subsets of the set S. 

A process p is modelled as an automaton. It has a set Q of states, with a distinguished subset I of 
initial states, and a distinguished subset F of final states, ft has a transition function, t, where r: Q 
x R x M -» x 9(X x P) x 9(R). The transition function maps p's state, a real number indicating its 
physical clock time, and an incoming message, all to a new state for p, a finite set of (message 
value, destination) pairs, and a finite set of times at which to set timers. For any r in R, m in M, Y in 
9(X x P), and Z in 9(R), if q is in F and if T(q,r,m) * (q'.Y.Z), we require that q' also be in F. That is, 
once a process is in a final state, it can never change to non-final state. 
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We assume that, in the absence of non-TIMER messages, a process does not set an infinite 
sequence of timers for itself within a finite amount of time. To state this condition formally, we 
choose any time r 1 and state q 1 for p, and consider the following sequence of applications of r : 

T p (q 1 ,r 1 ,TIMER) = (q 2 ,Y 2 ,Z 2 ) 

T p (q 2 ,r 2 ,TIMER) = (q 3 ,Y 3 ,Z 3 ), where r 2 > minfreZj: r>r 1 } 



Tpta^.TIMER) » (q j+1 ,Y. + 1 ,Z. + 1 ), where r. = min{r€U. B2 JL.\ r>r M } 



Then as i approaches oo, ft must be that r. approaches oo. 

We define a step of p to be a tuple (q,r,m,q',Y,Z) such that T(q,r,m) » (q'.Y.Z). 

A clock is a monotonically increasing, everywhere differentiable function from R (real time) to B 
(clock time). We will employ the convention that clock names are capitalized and that the inverse 
of a clock has the same name but is not capitalized. Also, real times are denoted by small letters 
and clock times by capital letters. 

A system of processes, denoted (P.N.S), consists of a set of processes, one for each name In P, a 
nonempty subset N of P called the nonfaulty processes, and a nonempty subset S of P called the 
self-starting processes. (We will use P to denote both the set of names and the set of processes, 
relying on context to distinguish the two.) The nonfaulty processes represent those processes 
that are required to follow the algorithm. The self-starting processes are intended to model those 
that will begin executing the algorithm on their own, without first receiving a message. A system 
of processes with clocks, denoted (P,N,S,PH), is a system of processes (P,N,S) together with a set 
of clocks PH ■ {Ph }, one for each p in P. Clock Ph is called p's physical clock. The transition 
function for pis denoted by t. Throughout this thesis we assume |P| ■ n. 
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2.4 Message System 

We assume that every process can communicate directly with every process, (including itself, for 
uniformity) at each step. The message system is modelled by a message buffer, which stores 
each message, together with the real times at which it is sent and delivered. For technical 
convenience, we do not require that messages be sent before being received. This correctness 
condition is imposed later. 

A state of the message buffer consists of a multiset of tuples, each of the form (p,x,q) or 
(TIMER.T.p) or (START.p), with associated real times of sending and delivery. The message (x,p) 
with recipient q is represented by (p,x,q). (TIMER ,T,p) indicates a timer set for time T on p's 
physical clock. (ST ART.p) represents a START message with p as the recipient 

An initial state of the message buffer is a state consisting of some set of START messages. The 
sending and delivery times are all initialized as oo. 

The behavior of the message buffer is captured as a set of sequences of SEND and RECEIVE 
operations, each operation with its associated real time. Each operation involves a message 
tuple. The result of performing each operation is described below. 

SEND(u.t): the tuple u is placed in the message buffer with sending time t and delivery time oo as 
long as there is no u entry already in the message buffer with sending time oo. if there is, then t is 
made the new sending time of the u entry with the earliest delivery time and sending time oo. 

RECEIVE(u,t): the tuple u is placed in the message buffer with delivery time t and sending time 
oo, as long as there is no u entry already in the message buffer with delivery time oo. If there is, 
then t is made the new delivery time of the u entry with the earliest sending time and delivery time 
oo. 

The message delay of a non-START message is the delivery time minus the sending time. A 
positive message delay means the message was sent before it was delivered. A negative message 
delay means the message was delivered before it was sent. A message delay of + oo means the 
message was sent but never delivered, and a message delay of -oo means the message was 
delivered, but never sent. (The message delay is not defined for START messages that are never 
delivered.) 
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2.5 Histories 

In this section we define a history, a construct that models a computation in which nonfaulty 
processes follow their state-transition functions. Constraints to ensure that the message system 
behaves correctly will be added in Section 2.8. 

Fix a system of processes and clocks J = (P,N,S,PH). 

An event for P is of the form receive(m,p), the receipt of message m by process p, where p is in 
P. A schedule for P is a mapping from IR (real times) to finite sequences of events for P such that 
only a finite number of events occur before any finite time, and for each real time t and process p, 
all TIMER events for p are ordered after all non-TIMER events for p. The first condition rules out a 
process taking an infinite number of steps in a finite amount of time, and the second condition 
allows messages that arrive at the same time as a timer goes off to get in "just under the wire". 

In order to discuss how an event affects the system as a whole, we define a configuration for P to 
consist of a state for each process in P and a state for the message buffer. An initial configuration 
for (P,N,S) consists of an initial state for each process and an initial state for the message buffer. 

An action for P is a triple (F,e,F), consisting of an event for P and two configurations F and P for 
P. F is the preceding and F the succeeding configuration for the action. 

A history for if is a mapping from real times to sequences of actions for (P,N,S) with the following 
properties: 

• the projection onto the events is a schedule; 

• if the sequence of actions is nonempty, then the preceding configuration of the first 
action is an initial configuration, and the succeeding configuration of each action is 
the same as the preceding configuration of the following action; 

• if an action (F,receive(m,p),F) occurs at real time t, then F * P except for p's state 
and the state of the message buffer; moreover, there exist Y in 5(X x P) and Z in «F(B) 
such that the buffer in P is obtained from the buffer in F by executing the following 
operations: 

oifm u START, then RECEIVE((START,p),t); 
if m «= TIMER, then RECEIVEKTlMER.PhJtJ.pM); 
if m = (x,p') for some p\ then RECEIVE((p\x,p),t); 

o SEND((p,x,p'),t) for all messages of the form (x,p') in Y; 

o SEND((TIMER,T,p),t) for all T in Z such that T > r (that is, as long as the timer is 
set for a future time); if T < r, then no operation is performed. 
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Furthermore, if p is in N, then (q,r,m,q',Y,Z) is a step of p, where q is p's state in F, r * 
Ph (t), and q' is p's state in P. 

The first condition merely ensures that only a finite number of occurrences take place by any 
finite time. The second condition states that the configurations match up correctly. The final 
condition causes the configurations to change according to the process' transition function, if ft is 
nonfaulty. Since a faulty process need not obey its transition function, it can send any messages 
and set any timers. 

Given if, an initial configuration F, and a schedule s, a history can be constructed inductively by 
starting with F and applying the transition functions as specified by the events in s to determine 
the next configuration. We will denote the history so derived by hist(s,F,5). 

Define, for each process p and history h, first-step(h.p) * min{t: h(t) contains an event for p}. 
This is the earliest time at which a step is taken by p in h. If p never takes a step, then first- 
step(h.p) is oo. Let first-step(h) * min £ p {first-8tep(h,p)}. This is the earliest time at which any 
process takes a step in h. Similarly, define, for each history h and nonfaulty process p, 
last-step(h.p) * min{t: h(t) contains a configuration in which p is in a final state}. This is the 
earliest time at which p is a final state. Define last-step(h) * max p g p {laststep(h,p)}. This is the 
earliest time in h after which all nonfaulty processes are in final states. If some p in N never enters 
a final state in h, then last-step(h,p) and last-step(h) are oo. 



2.6 Chronicles 

In order to isolate the steps of an individual process in a history from the real times at which they 
occur, we define a chronicle. 

The chronicle of nonfaulty process p in history h is the sequence of tuples of the form 
(q i ,r | ,m j ,q i ',Y | ,Z j ) which is derived as follows: if the i-th action for p occurs in h<t), then m, is the 
message received in that action, q, is the state of p in the preceding configuration of the action, r ( 
is p's physical dock reading at real time t, q.' is the state of p in the succeeding configuration, Y. is 
the collection of messages to be sent to the message buffer, and Z { is the collection of timers to be 
set. We know that each tuple is a step of p. 

Two histories, h for 5 = (P,N,S,PH) and h' for 3 ' = (P,N,S,PH'), are equivalent if, for each process 
p in N, the chronicle of p in h is the same as the chronicle of p in h'. 
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2.7 Shifting 

Given a schedule s, nonfaulty process p, and real number f , define a new schedule s' * 
s/7/7f(s,p,f ) to be the same as s except that an event for p appears in s'(t) if and only if the same 
event appears in s(t+ £), and the order of events for p is preserved. The result s' can easHy be 
seen to be a schedule also. All events involving p are shifted earlier by { if £ is positive, and 
shifted later by -f if { is negative. 

A set of clocks PH = {Ph q } q€p can also be shifted. Let PH' * sA/rf(PH,p,£ ) for p in N be the set of 
clocks defined by PH' - {Ph q '} q€p where Ph q '(t) * Ph q (t) if q * p, and Ph p '(t) » Ph p (t) + £ . 
Process p's clock has been shifted forward by {, but no other clocks are altered. 

Lemma 2-1 states that if a schedule and a set of clocks are shifted by the same amount relative to 

the same process, then the histories derived from those schedules and sets of clocks starting 

from the same initial configuration are equivalent. 

Lemma 2-1: Let** (P.N.S.PH) and r « (P.N.S.PH'), where PH' = shift(PH,p,£ ) for 
some process p and real number f . Let s be a schedule for P and s' ■ shift(s,p,f ). Let 
F be an initial configuration for S and $ \ Then the history hist(3,F,:f) - h is equivalent 
to the history hist(s',F,:f ') - h*. 

Proof: Let q be an arbitrary process in N. It suffices to show that the chronicle of q in h 
is the same as the chronicle of q in h* . 

Case 1: q * p. We proceed by induction on the elements of the chronicles. Let q's 
chronicle in h be (m,,qc,,Ph (ty.qnj.Yj.Z,) and in h' be {m | , ,qc | , ,Ph q , (t | , ) f qn | , 1 Y | , ,2 | '). (qc 
stands for current state, qn for next state.) 

Basis: i = 1. Thent, » first-stepfh.qjandt/ « first-stepfh'.q). By construction of h\ 
these real times are the same. Therefore, m 1 ■■ m/. Since F is the initial configuration 
in both h and h', qc 1 - qc/. Ph (t,) » Ph q '(t/) since Ph q » Ph q ' by construction. 
Finally, qn 1 « qn/, Y 1 « Y/, and Z, « Z/ since r q is deterministic and the inputs are 
the same. 

Induction: Assume the elements are the same up to i - 1, and show that the i-th 
elements are the same. Again, m, * m.' by construction of h'; qc, » qc,' by the 
induction hypothesis since qc. « qn, ., - qn, / * qc,'; Ph q (t,) » Ph q *(t,') 3S before; 
finally qn, * qn,\Y, » Y,\ andZ, ■ Z,' because r q is determmtetic. 

Case 2: q » p. Again we proceed by induction on the elements of the chronicles. Let 
p's chronicle in h be (m j ,qc i ,Ph p (t,),qn jl Y jI Z.) and in h' be <m.•,qc i \Ph p , (t i , ),qn j \Y j , ) Z 1 , ). 

First we note that by construction, t, = t,' + f for all i. 

Basis: i = 1. By construction, m 1 = m/. Since F is the initial configuration in both h 
and h\ q Cl . qc/. Ph (t,) - Phy<t/) since Ph p (t t ) - Ph^Vf) - Ph p '(t/ + J" -f). 
Finally, qn 1 * qn/, Y 1 - Y/, and Z n * Z/ since r p is deterministic and the inputs are 
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the same. 

Induction: assume the elements are the same up to i - 1, and show that the i-th 
elements are the same. m. = m.' by construction of h'; qc, ■ qc/ by the induction 
hypothesis; Ph (t) * Ph '(tj') by the same argument as in the basis case; and again qn, 
= qn^, Y, = Y, 1 ! and Z. « Z,'* since r p is deterministic. I 

The next lemma quantifies the changes to the message delays in a history when its schedule and 

set of clocks are shifted by the same amount relative to the same process. 

Lemma 2-2: Let* = (P.N.S.PH) and if ' » (P.N.S.PH'), where PH' = shift(PH,p,f ) for 
some p in P and real number f . Let s be a schedule for P and s' = shift(s,p,{). Let F 
be an initial configuration for * and * '. Then there is a one-to-one correspondence 
between the tuples in the message buffer in h = hist(s,F,:f) and h' = hist(s',F,:f '), and 
the message delays for corresponding elements wHI be the same in the two histories (if 
defined) except for two cases: 

1 . if the delay for any tuple of the form (p.x.q) is ju. in h for any process q * p and 
message value x, then the delay for the corresponding element in h' will be p + 
{; and 

2. if the delay for any tuple of the form (q,x,p) is p in h for any process q * p and 
message value x, then the delay for the corresponding element in h' will be p - 
I 

Proof: By Lemma 2-1, h and h' are equivalent. Therefore, the chronicles of all the 
processes are the same. The same messages are sent and received at the same 
physical clock times in h' and h. Also, the message buffers have the same START 
elements since the initial configuration is the same for both. Therefore, each element 
of the message buffer in h has a corresponding one in h' and vice versa. 

START messages are still either received at some finite time or not, thus START 
elements have the same delays in the two histories. Since only p's clock is shifted, the 
clocks of the other processes will bear the same relationship to real time in h' as in h, 
causing the delays for messages between processes other than p and the delays of 
timers for processes other than p to be the same in the two histories. The delays of 
timers for p wHI be the same as well, since they are both set and received £ earlier in h* 
than in h. 

Choose q * p. 

1 . Suppose (p,x,q) is sent at t and received at f in h. The relationship between s 
and s' implies that (p,x,q) is sent at t - f and received at t' in h'. Thus the 
message delay in h' is t' - (t - f ) ■ p + £ . 

2. Suppose (q.x.p) is sent at t and received at t' in h. The relationship between s 
and s' implies that (q,x,p) is sent at t and received at t' - £ in h'. Thus the 
message delay in h' is t' - £ - 1 * p - £. 
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2.8 Executions 

Now we require correct behavior of the message system. Accordingly, we define an execution to 
be a history with the necessary properties. 

We fix for the remainder of the thesis two nonnegative constants 8 and c with 8 > e. 

An execution for J is a history for y with four additional properties: 

• the initial state of the message buffer consists exactly of a START message for each 
process in S U (P - N), that is, for each self-starting process and each faulty process; 

• all START messages for nontaulty processes are received at some finite time; 

• the message delay of any non-TIMER and non-START message is between 8 - e and 
8 + e inclusive; and 

• any (TIMER.T.p) element of the message buffer, for any T and p, has finite message 
delay and is delivered at Ph p " 1 (T). 

The intent of the first condition is to model the self-starting processes as those processes that 
begin the algorithm on their own, and to allow the faulty processes to begin their bad behavior at 
arbitrary times. The second condition states that nonfaufty self-starting processes all receive their 
START messages. The third condition guarantees that all interprocess messages arrive at their 
destinations within 8 of being sent, subject to an uncertainty of e. The fourth condition ensures 
that a timer goes off if and only if it was previously set and that it goes off at the right time. 



2.9 Logical Clocks 

Each process p has as part of its state a local variable CORR, which provides a correction to its 
physical clock to yield the local time. During an execution, p's local variable CORR takes on 
different values. Thus, for a particular execution, it makes sense to define a function CORR p {t), 
giving the value of p's variable CORR at time t. For a particular execution, we define the local 
time for p to be the function L p) which is given by Ph p + CORR p . 

A logical clock of p is Ph plus the value of CORR p at some time. Let C° p denote the initial logical 
clock of p, given by Ph plus the value of CORR in p's initial state. Each time p adjusts its CORR 
variable, it is, in effect, changing to a new logical clock C' p for some i. The local time can be 
thought of as a piecewise continuous function, each of whose pieces is part of a logical clock. 
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Chapter Three 
Lower Bound 



3.1 Introduction 

In this chapter, we show a lower bound on how closely clocks can be synchronized, even if the 
clocks don't drift and no processes are faulty. Since these are strong assumptions, this lower 
bound also holds for the more realistic case in which clocks do drift and arbitrary faults occur. 
Just to show that the bound is tight, we present a simple algorithm that synchronizes the clocks 
as closely as the lower bound. 



3.2 Problem Statement 

For this chapter alone we make the following assumptions: 

1. clocks don't drift, i.e. dC p (t)/dt » 1 for all p and t; 

2. all processes are nonfaulty, i.e. N « P. Therefore, we will omit "N" from the notation. 

Since the processes have physical clocks which are progressing at the same rate as real time, the 
only part of the clock synchronization problem which is of interest is the problem of bringing the 
clocks into synchronization •- once this has been done, synchronization is maintained 
automatically. 

A clock synchronization algorithm (P,S) is y,a-correct if every execution h for (P.S.PH), for any set 
of clocks PH, satisfies the following three conditions: 

1 . Termination: All processes eventually enter final states. Thus, last-step(h) is defined. 

2. Agreement: |L (t) - L (t)| < y for any processes p and q and time t > last-step(h). 
We say h synchronizes to within y. 

3. Validity: For any process p there exist processes q and r such that C° q (t) - a < L p (t) 
<: C° r (t) + a for all times t > last-step(h). This ensures that p's new logical clock 
isn't too much greater (or smaller) than the largest (or smallest) old logical clock 
would have been at this time. We say h bounds the adjustment within a. 

We will show that no algorithm can be y.a-correct for y < 2c(1 - 1/n) and any a, where e is the 
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uncertainty in the message delivery time and n is the number of processes. Then we exhibit a 
simple algorithm that is 2e(1 - 1/n),e-correct. 



3.3 Lower Bound 

In this section we show that no algorithm can synchronize n processes' clocks any closer than 

2e(1-1/n). 

Theorem 3-1: No clock synchronization algorithm can synchronize a system of n 

processes to within y, for any y < 2c(1 - 1/n). 

Proof: Fix a system of processes (P,S) that synchronizes to within y. We will show that 

y£2e(1-1/n). 

Let P consist of processes p 1 through p n . Consider the system ^ ■ (P.S.PHj). 
Consider an execution h 1 = hist^.F.ty, for some schedule s 1 and initial 
configuration F, of any clock synchronization algorithm in which all messages from p { 
to p k have delay 5 -c if k>j, have delay 8 + e if k<j, and have delay* if k » j. 

Consider n - 1 additional histories, h 2 for system t 2 through h n for ^ n . The systems are 
constructed inductively by letting PH. = shift(PH, ^.p^e) and S. ■ (P,SiPH,). The 
histories are constructed inductively by letting s, » shifts, ^.p. v 2e) and h, ■ 
hist(s,,F,S' i ). Stated informally, the i-th history is obtained from the <i-1)-st history by 
shifting the schedule and set of clocks by 2c relative to the (M)-st process. Let Ph p 
be p's physical clock in PH,. 

By Lemma 2- 1 , all the h, are equivalent. 

Next we show by induction on i that h. is an execution for f., and further, that the delays 
in h, for messages from p } to p k are 6 + e if j < i and k > i, 8 - e if j £ I and k < I, 
otherwise as in h r 

Basis-. h 1 is an execution and the message delays are as required by hypothesis. 

Induction: Assume h, is an execution with the required message delays, and show that 
h. is also an execution with the required message delays. 

• The initial state of the message buffer is the same in h, + 1 as in h,, since both 
use initial configuration F. Thus the initial state is as required. 

• The START messages are all received in h, + 1 as they are in h r 

• By Lemma 2-2, a message in h. + 1 from p. ( to p m , m > i, will have delay 6 - 1 + 2e 
= 8 + e; one from p ( to p m , m < i, will have delay S-e + 2c * S + c; one from 
p to p., m > i, will have delay 8 + c - 2c = 8 - e; and one from p m to p,, m < i, 
will have delay 5 + e-2c*5-e. The others stay the same. Thus the delays 
are within the correct range. 



• Now we need to show that timers are handled properly in h j+r Lemma 2-2 
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implies that the message delays are the same in h. + 1 as in h., thus they are 
finite. For all processes except p., the timers arrive at the same real times and 
the same clock times in h j+ ^ as in h., and thus they arrive at the proper times in 
hj 1 . Consider a timer set by p, for T that arrives at T « ■ Ph' p (t) in h,. In h, 1 it 
arrives at t + 2e. However, since Ph i+1 D (t + 2e) » Ph' p (i) » T, the timer 
arrives at the proper time in h ( + 1 . 



"i p i 



Therefore, h. is an execution for ifj. 

Since h 1 was correct, it terminated; therefore, h. also terminates. Let t, » 
max. = 1 n {last-step(h j )}. In execution h v the algorithm synchronizes all the processes' 
clocks to values v 1 through v n at time t,, and all the values are within y. In particular, 

v n <v 1 + y. 

Since h. is equivalent to h M ,'the correction variable for any process p will be the same 
in both executions at time t,. The value of Pj ys logical clock at t, will be v, ., + 2« and 
the value of p.'s logical clock at t, will be v, by the way PH. is defined. Since these 
values are within y, we have 

v M <v, + 7-2e. 

Putting together this chain of inequalities, we have 

v n <v 1 + y£... <v. + (i-1)(y-2e) + y < ... £ v n + (n-1)(-y-2e) + y. 

Therefore, v n < v n + (n - 1)(? - 2c) + y, and so < (n - 1)y - (n - 1)2e + y. In order 
for this inequality to hold, it must be the case that y £» 2e(1 - 1 /n). I 



3.4 Upper Bound 

In this section we show that the 2e(1 - 1/n) lower bound is tight, by exhibiting a simple algorithm 
which synchronizes the docks to within this amount 

3.4.1 Algorithm 

There is an extremely simple algorithm that achieves the closest possible synchronization. As 
soon as each process p receives a message, it sends its local time in a message to the remaining 
processes and waits to receive a similar message from every other process. Immediately upon 
receiving such a message, say from q, p estimates q's current local time by adding 8 to the value 
received. Then p computes the difference between its estimate of q's local time and its own 
current local time. After receiving local times from all the other processes, p takes the average of 
the estimated differences (including for the difference between p and itself) and adds this 
average to its correction variable. Note that in contrast to many other agreement algorithms, in 
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this one each process treats itself non- uniformly with the others. 

Since it is obviously impractical to write algorithms in terms of transition functions, we have 
employed a clean, simple notation for describing interrupt-driven algorithms. To translate this 
notation into the basic model, we first assume that the state of a process consists of values for all 
the local variables, together with a location counter which indicates the next beginstep statement 
to be executed. The initial state of a process consists of the indicated initial values for all the local 
variables, and the location counter positioned at the first beginstep statement of the program. 

The transition function takes as inputs a state of the process, a message, and a physical time, and 
must return a new state and a collection of messages to send and timers to set. This is done as 
follows. The beginstep statement is extracted from the given state. The local variables are 
initialized at the values given in the state. The parameter u is set equal to the message. The 
variable NOW is initialized at the given physical time + CORR. The program is then run from the 
given beginstep statement, just until it reaches an endstep statement. (If it never reaches an 
endstep statement, the transition function takes on a default value.) The next beginstep after that 
endstep, together with the new values for all the local variables resulting from running the 
program, comprise the new state. The messages sent are all those which are sent during the 
running of the program, and similarly for the timers. 

There is a set-timer statement, which takes an argument U representing a logical time. The 
corresponding physical time, U - CORR, is the physical time described by the transition function. 
(This statement is not used in this algorithm but will be used later in the thesis.) 

We will use the shorthand NOW to stand for the current logical clock time and ME for the id of the 
process running the code. 

For this algorithm, initial states are those in which the location counter is at the beginning of the 
code, local variables CORR and V have arbitrary values, and local variables SUM and 
RESPONSES have value 0. Final states are those in which the location counter is at the end of 
the code. 

The code is in Figure 3-1 . 

We will show that any execution h of Algorithm 3-1 is Y,«-correct, where y ■ 2c(1 - 1/n) and a * 
c. Thus, Algorithm 3-1 synchronizes the clocks to within 2e(1 - 1/n), showing that the lower 
bound is tight. The upper bound isn't as unintuitive as it might took at first glance; it can be 
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beginstep(u) 

send(NOW) to all q * ME 

do forever 

if u = (v,q) for some message value v and process q then 

V :* v + 5 - NOW 

SUM :* SUM + V 

RESPONSES := RESPONSES + 1 

endif 
if RESPONSES = n - 1 then exit endif 
endstep 
beginstep(u) 
enddo 

CORR := CORR + SUM/n 
endstep 

Figure 3-1 : Algorithm 3-1, Synchronizing to within theLower Bound 

rewritten as (2c + (n - 2)2e)/n, the average of the discrepancies in the estimated differences. 
The estimated differences of two processes for each other can differ by at most e apiece (giving 
the 2e term), and their estimated differences for the other n - 2 processes can differ by up to 2e 
apiece (giving the (n - 2)2e term). Then the estimated differences are averaged, so the sum is 
divided by n. A more careful analysis is given below. 



3.4.2 Preliminary Lemmas 

The next two results follow easily from the assumption that clocks don't drift. 

Lemma 3-2: For any p and i £ 0, d p (f) - d p (t) - r - 1 

P roof: Immediate since the slope of C' is 1 . I 

Lemma 3-3: For any p and q, i > 0, and times t and t\ C' p (t') - C^f) = C' p (t) - C' q (t). 

Proof: C' (f) - C' (t) * ' f - 1 - C' (f) - C ! (t) by two applications of Lemma 3-2. The 
result follows. I 

Now we can define the initial difference between two processes' clocks in execution h. Define 

A to be C° (t) - C° (t). That is, A is the difference in local times before either of the processes 
pq p* q w PQ 

has changed its correction variable. Since there is no drift in the clock rates, any time will give the 

same value. 

Lemma 3-4: For any execution h, and processes p and q, A^ » -A qp . 
Proof: Immediate from the definition of A. I 

Lemma 3-5: For any execution h, and processes p, q, and r, A M » A^ + A^. 
Proof: Immediate from the definition of A. I 
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3.4.3 Agreement 

For a * d. let V be the value of variable V in the code when q's message is being handled by p. 

qp 

V = L (t) + 8 - L (f), where local time L (t) was sent by q at real time t and received by p at real 

time t'. Let V = 0. We will denote SUM/n, p's addition to its correction variable, by A . 
pp K 

First we relate the estimate V qp to the actual value A qp . 
Lemma 3-6: |V„ -A „J < e. 

Qp Qp 

Proof: Suppose at real time t, q sent the value L q (t), which was received by p at real 
timet'. Then 



lV ap - A qp | = |L q (t) + 5-L p (f)-A qp | = |C q (t) + 5-C° p (f)-A qp | 

' l C V l) + \p + « -C° p (t')-A qp |, by definition of A qp 

= |C° p (t)-C° p (f) + 6| 

■ |t — f + 8\, by Lemma 3-2 

= |5-(t'-t)| 

< |5 - (8 - e)|, since 8 - e is the smallest message delay 

-el 

Here is the main result. 

Theorem 3-7: (Agreement) Algorithm 3-1 guarantees clock synchronization to within 

2e(1-1/n). 

Proof: We must show that for any execution h, any two processes p and q, and all 

times t after last-step(h), 

|L p (t)-L q (t)|<2*-2e/n. 

Without loss of generality, assume p - p 1 and q ■ p 2 , so that the remaining processes 
are p 3 through p n> By the way the algorithm works, 

|L p (t)-L q (t)| = |(C° p (t) + A p )-(C° q (t) + A q )| = |A M + A p -A q |. 

We know by definition of A and A q that 

ApMI/nKV^ + V^ + S-^^Jand 

A q - d/n)^ + V^ + 2 u3 .,V p ^). 

Substituting these values and noting that V pp ■ V^ - 0, we get 

|L p (t)-L q (t)| - |A M + (l/nKV qp + 2,. 3 . j , v I ui- V m'- S i-*4.V 
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= (1/n)|nA M + V qp + S |Ba ..„V V M- I «-^V 

- wm„ + v + < w + ^=3.^ + v v 

< (1/n)(|A w + V qp | + |A W - VJ + S i=3 n |A M + V pjp - V p|q |) 

< (1 /n)<e + c + 2. x 3 n |A w + V p p - V p q |), by Lemmas 3-6 and 3-4 
. (1/n)(2 £ + 2 U3 .. n |A pPi + A p _ q + V^-V^l), by Lemma 3-5 

= (1/n)(2e + 2 i=3 ,. n l(V pjP - A pjp ) - (V p _ q - A^JI), by Lemma 3-4 
<(1/n)(2e + 2 j=3 n |V pip -A pjP l + 2 i=3 .J-(V p „-A pjq )|) 

< <1/n)(2e + 2 j=3 n e + 2, =3 n e), by Lemma 3-6 
<(1/n)(2c + (n-2)2e) 

= 2c(1-1/n). I 

3.4.4 Validity 

The validity result states that each new logical clock is within c of what one of the initial logical 

clocks would have been. 

Theorem 3-8: (Validity) Algorithm 3-1 bounds the adjustment within c. 
Proof: By definition, the amount to be added to CORR p is A - (1/n) \e P ^- Tnen 
min q€p V qp < A p < max^pV^. Let q be the process with the minimumV qp . Tet r be 
the process with the maximum V . Then, 

V„„ £ A„ < V_. 

qp -= p — rp 

By applying Lemma 3-6 to each end of this inequality, we get 

A-e<V<A<V<A+e. 

qp '- qp^ p- rp- rp 

Adding p's initial clock vaiue C° p (t) for t > t, , we get 
C> + A qp - e < C° p (t) + A p < C° p (t) + A^ + e, 
which together with the definition of A implies 
C° Q (t)-e<L(t)<C r {t) + e. I 
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Chapter Four 
Maintenance Algorithm 



4.1 Introduction 

This chapter consists of an algorithm to keep synchronized clocks that are close together initially, 
and an analysis of its performance concerning how closely the clocks are synchronized and how 
close the clocks stay to real time. The algorithm handles clock drift and arbitrary process faults. 
The algorithm requires the clocks to be initially close together and less than one third of the 
processes to be faulty. (Dolev, Halpern and Strong [2] show that it is impossible without 
authentication to synchronize clocks unless more than two thirds of the processes are nonfaulty.) 

This algorithm runs in rounds, resynchronizing periodically to correct for clock drift, and using a 
fault-tolerant averaging function based on those in [1] to calculate an adjustment. The size of the 
adjustment is independent of the number of faulty processes. At each round, h 2 messages are 
required, where n is the total number of processes. The closeness of synchronization achieved 
depends only on the initial closeness of synchronization, the message delivery time and its 
uncertainty, and the drift rate. We give explicit bounds on how the difference between the clock 
values and real time grows as time proceeds. The algorithm can be easily adapted to include 
reintegration of repaired processes as described in Section 4.8. 



4.2 Problem Statement 

We are now considering the situation in which clocks can drift slightly and some proportion of the 
processes can be faulty. Therefore, the statement of the problem differs from that in Chapter 3. 

For a very small constant p > 0, we define a clock C to be p-bounded provided that for all t 
1 -p < 1/(1 + p) ^ dC(t)/dt < 1 + p < 1/(1 -p)- 

We make the following assumptions: 

1. All clocks are p-bounded, including those of faulty processes, i.e., the amount by 
which a clock's rate is faster of slower than real time is at most p. (Since faulty 
processes are permitted to take arbitrary steps, faulty clocks would not increase their 
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power to affect the behavior of nonfaulty processes.) 

2. There are at most f faulty processes, for a fixed constant f, and the total number of 
processes in the system, n, is at least 3f + 1 . • 

3. A START message arrives at each process p at time T° on its initial logical clock C° pI 
and t° is the real time when this occurs. Furthermore, the initial logical clocks are 
closely synchronized, i.e., |c° p (T°) - c° q (T°)| < 0, for some fixed and all nonfaulty p 
andq. 

We let tmax * max p nonfaulty {t° p } and analogously for tmin . 

The object is to design an algorithm for which every execution in which the assumptions above 
hold satisfies the following two properties. 

1 . y- Agreement: |L (t) - L q (t)| < y, for all t > tmin and all nonfaulty p, q. 

Ma^ogj-Validity: a^t-tmax ) + T° - o 3 < L p (t) < o 2 (t - tmin ) + T° + « 3 ,forallt 

>t° and all nonfaulty p. 
— . p 

The Agreement property means that all the nonfaulty processes are synchronized to within y. The 
Validity property means that the local time of a nonfaulty process increases in some relation to 
real time. We would, of course, like to minimize a v o 2 , a 3 , and y. 



4.3 Properties of Clocks 

We give several straightforward lemmas about the behavior of (p- bounded) clocks. 
Lemma 4- 1 : Let C be any clock. 

(ajlft^tg.then 

O-pHta-ty^fla-V/O + p)<C{\J-CH,)<V + pHta-t^W/d-p). 

(tylfT^T^then 

(1-p)(T 2 - T^ < (T 2 - T^/O + p) < C(T 2 ) - CCT,) £ d + P)(T 2 - T,) <, <J 2 - T^/fl-p). 
Proof: Straightforward. I 
Lemma 4-2: Let C and D be clocks. 

(a) If dC(t)/dt = 1 and T 1 < T 2 , then 

|(c(T 2 ) - d(T 2 )) - (cfty- dfT,))! - |(C(T 2 ) - cfT,)) - (dda) - d^))! < p(T 2 - T,). 

(bJIfT^Tythen 
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Kcfr^-CKT^-tcfT^-dfr,))! = \(c(T 2 )-c(TJ)-W 2 )-d(l,))\<2p(T 2 -TJ. 

(c) If dC(t)/dt = 1 and t 1 < t 2 , then 

KC^-Dd^-fCtt^-D^))! = KCty-ctt^-pty-D^jM^p^-v. 

(d) H t 1 < t2, then 

KC(t 2 ) - D(t 2 )) - (c^) - Da,))! - l(C(t 2 ) - ca,)) - (D<t 2 ) - Da,))! < 2p(t 2 - v. 

Proof: Straightforward using Lemma 4-1 . I 

Lemma 4-3: Let C and D be clocks, T, < T 2 . Assume |c(T) - d(T)| < a for all T, T, < 

T<T 2 . Lett, = min{c(T 1 ),d(T i )}andt a -' max{c(T a ),d(T a )}. 

Then |C(t) - D(t)| < (1 + p)a for all t, t, <> t < t 2 . 

Proof: There are four cases, which can easily be shown to be exhaustive. 

Case 1: cfT,) < t < c(T 2 ). 

Let T 3 - C(t), so that T, < T 3 < T 2 . By hypothesis, |c(T 3 ) - d(T 3 )| £ a. Then |T 3 - 
D(t)| < (1 + p)«, by Lemma 4-1. 

Case 2: d(T, ) < t <, d(T 2 ). This case is analogous to the first. 

Case 3; c(T 2 )<t<d(T 1 ). 

Then c(T|) < t < d(T,). So C(t) > D(t), and thus 

|C(t)-D(t)| » C(t)-D(t) « (CfD-T,) + (T r D(t)) 

<(1 + pKt-cfJ,)) + (1 + pKdCg-t), by Lemma 4-1, 

»(1 +p)(d(T 1 )-c(T 1 ))<(1 +p)o. 

Case 4: d(T 2 )<t<c(T,). This case is analogous to the third. I 

4.4 The Algorithm 

4.4.1 General Description 

The algorithm executes in a series of rounds, the i-th round for a process triggered by its logical 
clock reaching some value T'. (It will be shown that the logical clocks reach this value within real 
time fi of each other.) When any process p's logical clock reaches T\ p broadcasts a Y message. 
Meanwhile, p collects Y messages from as many processes as it can, within a particular bounded 
amount of time, measured on its logical clock. The bounded amount of time is of length (1 + p){fi 
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+ 5 + e), and is chosen to be just large enough to ensure that T' messages are received from all 
nonfaulty processes. After waiting this amount of time, p averages the arrival times of all the T 
messages received, using a particular fault-tolerant averaging function. The resulting average is 
used to calculate an adjustment to p's correction variable, thereby switching p to a new logical 
clock. 

The process p then waits until its new clock reaches time T' +1 = T 1 + P, and repeats the 
procedure. P, then, is the length of a round in local time. 

The fault-tolerant averaging function is derived from those used in [1] for reaching approximate 
agreement. The function is designed to be immune to some fixed maximum number, f , of faults. It 
first throws out the f highest and f lowest values, and then applies some ordinary averaging 
function to the remaining values. In this paper, we choose the midpoint of the range of the 
remaining values, to be specific. 

4.4.2 Code for an Arbitrary Process 

Global constants: p, p, 8, e, and P, as defined above. 

Local variables: 

• CORR, initially arbitrary; correction variable which corrects physical time to logical 
time. 

• ARRfq], initially arbitrary; array containing the arrival times of the most recent 
messages, one entry for each process q. 

• T, initially undefined; local time at which the process next intends to send a message. 

Conventions: 

• NOW stands for the current logical clock time (i.e., the physical clock reading + 
CORR). NOW is assumed to be set at the beginning of a step, and cannot be 
assigned to. 

• REDUCE, applied to an array, returns the multiset consisting of the elements of the 
array, with the f highest and f lowest elements removed. 

• MID, applied to a multiset of reals numbers, returns the midpoint of the set of values 
in the multiset. 

The code is in Figure 4-1 . 
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beginstep(u) 
do forever 

/• in case T 1 messages are received before this process reaches T •/ 

while u = (m,q) for some message m and process q do 
ARR[q] := NOW 
endstep 
beginstep(u) 
endwhile 

/* fall out of the loop when u = START or TIMER; begin round •/ 

T := NOW 

broadcast(T) 

set-timer(T + (1 + p)(/3 + 8 + c)) 

while u = (m,q) for some message m and process q do 
ARR[q] :» NOW 
endstep 
beginstep(u) 
endwhile 

/• fall out of the loop when u = TIMER; end round •/ 

AV := mid(reduce(ARR)) 

AD J := T + 8 - AV 

CORR :* CORR + AOJ 

set-timer(T + P) 

endstep 

beginstep(u) 

enddo 

Figure 4-1: Algorithm 4-1, Maintaining Synchronization 



4.5 Inductive Analysis 

Although the algorithm is fairly simple, its analysis is surprisingly complicated and requires a long 
series of lemmas. 

4.5.1 Bounds on the Parameters 

We assume that the parameters p, 6\ and c are fixed, but that we have some freedom in our 
choice of P and fi, subject to the reasonableness of our assumption that the clocks are initially 
synchronized to within 0. We would like to be as small as possible, to keep the clocks as 
closely synchronized as we can. However, the smaller is, the smaller P must be (i.e., the more 
frequently we must synchronize). 
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There is also a lower bound on P. In order for the algorithm to work correctly, we need to have P 
sufficiently large to ensure the following. 

(1) After a nonfaulty process p resets its clock, the local time at which p schedules its next 
broadcast is greater than the local time on the new clock, at the moment of reset. 

(2) A message sent by a nonfaulty process q for a round arrives at a nonfaulty process p after p 
has already set its clock for that round. 

Sufficient bounds on P turn out to be: 

P>2(1 + p)(/3 + e) + (1 + p)max{8,fi + c} + p8, and 

P<0/4p-e/p-pO8 + & + e)-20-8-2e. 

A required lower bound on /? is fi> 4c + 4p(30 + 8 + 3e) + 8p 2 (fi + 5 + c). 

Any combination of P and which satisfies these inequalities will work in our algorithm. If P is 
regarded as fixed, then /5, the closeness of synchronization along the real time axis, is roughly 4e 
+ 4pP. This value is obtained by solving the upper bound on P for /} and neglecting terms of 
order p. 

4.5.2 Notation 

LetT 1 = T° + iPandU 1 = T 1 + (1 + p){fi + 8 + c), for all i £ 0. 

For each i, every process p broadcasts T 4 at its logical clock time T 1 (real time t' p ) and sets a timer 

to go off when its logical clock reaches U\ When the logical dock reaches U 1 (at real time u'p), the 

process resets its CORR variable, thereby switching to a new logical clock, denoted C i+1 p . Also 

at real time u' , the process sets a timer for the time on its physical clock when the new logical 

clock C' + 1 reaches T 1 + 1 . It is at least theoretically possible that this new timer might be set for a 
p 

time on the physical clock which has already passed. If the timer is never set in the past, the 
process moves through an infinite sequence of clocks C° p , C 1 p , etc, where C° p is in force in the 
interval of real time (-<»,u p ), and each C p , i £ 1, is in force in the interval of real time [u M p , u p ). 
If, however, the timer is set in the past at some u' , then no further timers arrive after that real time, 
and no further ^synchronizations occur. That is, C i+ 1 D stays in force forever, and u^ and t'p are 
undefined for jj>i + 1. 
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Let tmiiV denote min p n^^A'p}. and analogously for tmax', umin' and umax'. 

For p and q nonfaulty, let ARR p (q) denote the time of arrival of a T 1 message from q to p, sent at 
q's clock time T 1 , where the arrival time is measured on p's local clock C' p . (We will prove that C' p 
has actually been set by the time this message arrives.) Let AV' p denote the value of AV 
calculated by p using the ARR j p values, and let ADJ ! p denote the corresponding value of ADJ 
calculated by p. Thus, C j + 1 p = d p + ADJ j p . 

This section is devoted to proving the following three statements for all i > 0: 

(1 ) The real time t j is defined for all nonfaulty p. (That is, timers are set in the future.) 

(2) |t' - 1' | < p, for all nonfaulty p and q. (That is, the separation of clocks is bounded by 0.) 

(3) t' + 5 - c > u'" 1 , for all nonfaulty p and q, and i £ 1. (That is, messages arrive after the 

* ' p q 

appropriate clocks have been set.) 

The proof is by induction. For i = 0, (1 ) and (2) are true by assumption and (3) is vacuously true. 

Throughout the rest of this section, we assume (1), (2), and (3) hold for i. We show (1), (2), and (3) 
for i + 1 after bounding the size of the adjustment at each round. 

4.5.3 Bounding the Adjustment 

In this subsection, we prove several lemmas leading up to a bound on the amount of adjustment 
made by a nonfaulty process to its clock, at each time of resynchronization. 
Lemma 4-4: Let p and q be nonfaulty. 

(a) ARR* p (q) £ f + (1 + p)(0 + 6 + e). 

(b) If 8 - e > 0, then ARR' p (q) > T 1 + (1 - p)(« - 1 - j8). 

(c) If S - e < fi, then ARR' p (q) > T 4 - (1 + p)(fi - « + e). 
Proof: Straightforward using Lemma 4-1. ■■ 

Lemma 4-5: Let p be nonfaulty. Then there exist nonfaulty q and r with 

ARR^qJ^AV^ARR^r). 

Proof: By throwing out the f highest and f lowest values, the process ensures that the 

remaining values are in the range of the nonfautty processes' values. I 

We are now able to bound the adjustment. 
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Lemma 4-6: Let p be nonfaulty. Then |ADJ j p | < (1 + p){fi + e) + p8. 
Proof: ADj' p = T j + 8-AV ! p . 

Thus, for some nonfaulty q and r, Lemma 4-5 implies that 

V + 8 - ARR j p (q) < ADJ j p < T ! + 8 - ARR' p (r). 

Then Lemma 4-4 implies that: 

(a) ADJ^f + 8-0"' + (1 + p)(j3 + 8 + e)) * -(1 + p)(P + «)-p«- 

(b) If « - e > >S, then ADJ^f + 8-0"' + (1 -p)(8-e-)8)) * (1-p)(0 + e) + P«- 

(c)lf8-e^j8,thenADJ i p <T i + 8-(T j -(1 + p)(fi-8 + c)) - (1 + p)(fi + e)-p8. 

The conclusion is immediate. I 

4.5.4 Timers Are Set in the Future 

Earlier, we gave a lower bound on P and described two conditions which that bound was 
supposed to guarantee (that timers are set in the future and that messages arrive after the 
appropriate clocks have been set). In this subsection, we show that the given bound on P is 
sufficient to guarantee mat the first of these two conditions holds. 

Lemma 4-7: Let p be nonfaulty. Then U 1 + ADj'^T 1 + 1 . 

Proof: U* + ADJ^U 1 + (1 + p)(j8 + e) + p8, by Lemma 4-6 

= U' + (2(1 + p)(fi + e) + (1 + p)8 + p«)-(1 + p)(fi + 8 + e) 
< U 1 + P-(1 + p)(/J + 8 + e), by the assumed lowerbound on P 

. T i + 1 . I 

This lemma implies that timers are set in the future and that t' + 1 p is defined, the first of the three 
inductive properties which we must verify. 

4.5.5 Bounding the Separation of Clocks 

Next, we prove several lemmas which lead to bounds on the distance between the new clocks of 
nonfaulty processes. The first lemma gives an upper bound on the error in a process' estimate of 
the difference in real time between its own clock and another nonfaulty process' clock reaching 

T 4 . 

Lemma 4-8: Let p, q and r be nonfaulty. Then 
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KARftyqJ-fT' + ^-(c^fj-c^f))! < e + p(fi + 8 + c). 
Proof: Let a be the real time of arrival of q's message at process p. Then a is at most 
c' (T') + 8 + c. Define a new auxiliary clock, D, with rate exactly equal to 1, and such 
that D(a) ■ C' (a). Thus, ARR j (q) = D(a). So the expression we want to bound is at 
most equal to: 

l(D(a)-(T' '+ sjMc^fj-dcr 1 ))! + icyfj-dcr 4 )!. 

First we demonstrate that the first of these two terms is at most e. 

|D(a)-(T i + 8)-c i q (T , ) + d(T , )| 

= |a-d(T' + S)-c' IV) + d(T j )|, since D has rate 1 

= |a-c' q (T s ) + T'-rr' + fi)| 

^Ic^rj + fi + c-c^fj-fii 

» e. 

Next we show that the second term, Ic^ff) - dfT*)|, is at most p{fi + 8 + e). 

Case 1: c' (T') < a. So p reaches T 1 before q's message arrives. 

Let 7 = a-c'^T 1 ). Then y <p + 8 + e. 

Subcase 1a: d(T') > c' (T 1 ). So C has rate slower than real time. 

Then d(T') - c' (T 1 ) is largest when C goes at the slowest possible rate, 1/(1 + p). In 
this case, d(ff - d (l) * y - (a - d(T)), where a - d(T') « 7/(1 + P)- Thus, 6{V) - 
c'pCT 1 ) * y(1 - 1/(1 + p)) * yp/(l + p) < yp £ ptfi + 8 + e). 

Subcase 1b: d(T') <, c' (T'). So C has rate faster than real time. 

Then c' (T') - d(f) is largest when C goes at the fastest possible rate, 1 + p. Then 
c'pfT 1 ) - P d(f) » 7(1 + p) - y - yp £ pifi + 8 + e). 

Case 2; c' (T 1 ) > a So p reaches T 1 after q's message arrives. 

LetY = c 1 (fl-a. Then7<j8-8 + c. 

Subcase 2a; d(T') > c' (T'). So C has rate faster than real time. 

An argument similar to that for case 1b shows that d(T') - c' p (T') < 7p < pifi - 8 + e), 
which suffices. 

Subcase 2b: d(T') < c' (T'). So C has rate slower than real time. 

An argument similar to that for case 1a shows that c l p (t) - d(T') < yp < p(P - 8 + e), 
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which suffices. I 

In order to prove the next lemma, we use some results about multisets, which are presented in the 

Appendix. This is a key lemma because the distance between the clocks is reduced from to 

0/2, roughly. The halving is due to the properties of the fault-tolerant averaging function used in 

the algorithm. Consequently, the averaging function can be considered the heart of the 

algorithm. 

Lemma 4-9: Let p and q be nonfaulty. Then 

Kc , p (T , )-c , q (T , ))-(ADJ , p -ADJ , q )| </3/2 + 2c + 2p(fi + 8 + e). 

Proof: We define multisets U, V, and W, and show they satisfy the hypotheses of 

Lemma A-4. Let 

U »c i p (T i )-(f + fi) + ARR'p, 

V » C 1 {T^-fl* + 6) + ARR q ,and 

W * {c^T'): r is nonfaulty}. 

U and V have size n and W has size n - f. 

Letx ■ e + p{fi + 8 + e). 

Define an injection from W to U as follows. Map each element c^fj) in W to c pOV-JT 
+ 5) + ARR' (r) in U. Since Lemma 4-8 implies that |(ARR ( (r) - (V + «)) - (c r (T) - 
c 1 <T*))| < e + P p(j8 + 8 + c) for all the elements of W,d x (W,0) = 0. Similarly, d x (W,V) 

= p a 

Since any two nonfaulty processes reach T 1 within /Treat time of each other, diam(W) 

By Lemma A-4, |mid(reduce(U))-mtd(reduce(V))| </5/2 + 2e + 2p{fi + 8 + e). 

Since mid(reduce(U)) « midf.reducef.c' (T 1 ) - (T 1 + 8) + ARR' )) * c^O 4 ) - ADj' p , and 
similarly mid(reduce(V)) - c ' q (T') - ADJ qI the result follows. I 

The next lemma is analogous to the previous one, except that it involves U 1 instead of r. 
Lemma 4- 1 0: Let p and q be nonfaulty. Then 

|(c j (U'j-c^U'W-fADj'p- ADJ^I £0/2 + 2c + 2p(2 + p){fi + 8 + c). 
Proof: The given expression is 

< Kc'prt - <(?)) - (ADJ j p - ADJ q )| + Kc^U') - c^U')) - (c^ff) - c l q (f))\ 

<fi/2 + 2c + 2p(/5 + 8 + c) + 2p(1 + p){fi + 8 + c), by Lemmas 4-9 and 4-2. 
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This reduces to the claimed expression. I 

Next we bound the distance in real time between two nonfaulty processes switching to their new 
clocks. It is crucial that the distance between the new clocks reaching U 1 be less than in order 
to accommodate their relative drift during the interval between U 1 and V * . 
Lemma 4- 1 1 : Let p, q be nonfaulty. Then 

|c i+1 (U')-c i + 1 (U')| < p72 + 2c + 2p(30 + 25 + 3c) + 4p 2 (/3 + 5 + c). 

Proof: We define idealized clocks, D and D , as follows. Both have rate exactly 1. 

Also.D (u j ) = C i+1 (u' ) - U' + ADJ. and similarly for q. Then 
' p* p' p p p 

|c j + 1 p (U j ) - c ! + 1 q (U')| < |c s + 1 p (U ! ) - d p (U f )| + Jd p (U s ) - d q (U j )| ♦ Id^-c'^lJ')!. 
We bound each of these three terms separately. 

First, consider |c ,+1 p (U , )-d p <U i )|. Now.U 1 + ADJ j p » D p (u j p ) * C+^u^. So 
|c ,+ Vrt-<y^£Kc%(U^-d p (U^Mc%<tf + ADJ^-dptU' + ADj' p ))| 

< plADJ 1 |, by Lemma 4-2 

^p(0 + p)(/3 + e) + p5), by Lemma 4-6. 
The same bound holds for the third term. 

Finally, consider the middle term, |d (U 1 ) - d q (U')|. We know that d p (U') = d p (U' + 

ADJ' ) - ADJ' ■ u' - ADJ' , and similarly for q. 
p' p p P 

Icyu'j-d^U')! - Ku'p-u^-tADj'p-ADj'^l 

< pV2 + 2e + 2p(2 + p)(/5 + 5 + c), by Lemma 4-10. 
Combining these three bounds, we get the required bound. I 

Finally, we can show the second of our inductive properties, bounding the distance between 
times when clocks reach T* + 1 . 

Lemma 4- 1 2: Let p, q be nonfaulty. Then |t' + \ - 1 1 + 1 q | £ fi. 

Proof: Hf +1 p -lf +1 J 

- |c i+1 p (T i+ Vc ,+1 q (T i+1 )| 
<\(c i +\lJ u ')-c"\<f+'))-{c u \(\J)-c u \(\i))\ ♦ Ic'^-c'* 1 ^')! 

< 2p(P-(1 + p)(fi + 6 + c)) + 0/2 + 2c + 2p(30 + 25 + 3c) + Ap 2 {fi + 5 + e), by 
Lemmas 4-2 and 4-1 1 . 
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The assumed upper bound on P implies that this expression is at most B. I 

4.5.6 Bound on Message Arrival Time 

In this subsection, we show that the third and final inductive assumption holds. That is, we show 

that messages arrive after the appropriate clocks have been set. 

Lemma 4- 1 3: Let p and q be nonfaulty. Then t i+1 q + fi-e> u' p . 

P roof: Since t i+1 +8-e>t i + 1 -B + 8-t, it suffices to show that 

q — p 

l? + 1 „-u l „>iB-* + «■ 

p p r 

Now, t i+1 - u' > (P - (1 + p){B + 8 + e) - ADj' p )/(1 + p) since the numerator 
represents the smallest possible difference in the values of the clock C ,+ 1 p at the two 
given real times. 

But the lower bound on P impljes that P > 3(1 + p)(fi + c) + p8. Also, the bound on 
the adjustment shows that AD J' p <(1 + p){fi + c) + p8. Therefore, 

t i + 1 -u' >(3(1 + p)(fi + t) + p«-(1 + P)W + « + «)-0 + P)03 + e)-P«)/(1 + 
P) 

= fi - 8 + c, as needed. I 

Thus, we have shown that the three inductive hypotheses hold. Therefore, the claims made in this 
section for a particular i, in fact hold for all i. 



4.6 Some General Properties 

In this section, we state several consequences of the results proved in the preceding section. 

First, we state a bound on the closeness with which the various clocks reach corresponding 

values. 

Lemma 4- 1 4: Let p, q be nonfaulty, i > 0. Assume that T is chosen so that U ' < T 

< U j , if i £ 1, orsothatT < T < U 6 , if i - 0. 

Thenlc'CH-c'fOI^/J + 2p(l + p){B + 8 + e). 
Proof: Basis: i = 0. Then T° <; T < U°. 

|c° p (T) - C° q (T)| < |(c° p (T) - c° a (T)) - (c° p (T°) - c° q (T°))| + |c° p (T°) - c° q (T°)| 

< 2p(T - T°) + B, by Lemma 4-2 and assumption 3 

< B + 2p(1 + p){B + 5 + «). 
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Induction: i > 0. Choose T with U M ^ T < U j . 

Ic'pfO - c^fOI < Kc'pd) - c j q (T)) - (c j p (U M ) - C^U' 1 ))! + Ic'pfl/ 1 ) - C^U' 1 )! 

<2pP + fi/2 + 2e + 2p(3jS + 25 + 3e) + 4p 2 {fi + 8 + c), by Lemmas 4-2 and 4-11. 

The upper bound on P implies the result. I 

Next, we prove a bound for a nonfaulty process' (i + 1)-st clock, in terms of nonfaulty processes' 

i-th clocks. 

Lemma 4-15: Let p be nonfaulty, i > 0. Then there exist nonfaulty processes, q and 
r, such that for u' < t < umax', 

C\(t)-a<C^\(\)<C\(t) + a, 

where a * e + p{Afi + 6 + 5c) + Ap 2 {fi + fi + e) + 2p 3 {fi + fi + e). 

Proof: C i+1 (t) » C' (t) + f + 5-AV j . Therefore, by Lemma 4-5 there are nonfaulty 

D p P 

processes, q and r, for which 

C j p (t) + f + 6 - ARR j p (q) <, C + 1 p (t) < C j p (t) + V + 8 - ARR' p (r). 

We show the right-hand inequality first. Let a - c [ L(ARR' (r)), the real time at which 
the message arrives at p from r. Thus, C' (a) - ARR 1 (r). Note that C ! r (a) £ f + (1 - 
p)(8-t). 

C' + 1 ft) < d„ + f + 8 - ARR' (r), from above 
p ~~ p p 

^ c > + Cptaj-C'^a) + f + fi-ARR j p (r) + (C' p (t) - C' r (t)) - (C' p (a) - c' r (a)) 

< C' r (t) + C' (a) - C' r (a) + V + 8 - ARR j p (r) + 2p(t - a), by Lemma 4-2 since t >a 

<C' r (t) + ARR'pM-f-O -p)(8-t) + T' + «-ARR' p (r) + 2p(t-a) 

■ C' r (t) + c + pS-pe + 2p(t-a). 

It remains to bound t - a. The worst case occurs when t * umax 1 . The longest 
possible elapsed real time between a particular nonfaulty process reaching T* and U 1 
on the same clock is (1 + pf{fi + 8 + e). Thus, umax' - tmin' ^ fi + (1 + pf(fi + 8 
+ e). Buta^tmin' + 8-e. Therefore, t - a < fi + (1 + pf(fi + 8 + e)-S + e 

Thus,C i+1 p (t) <d r (t) + c + p8-pe + 2p(fi + (1 + p) 2 (fi + 5 + e)-fi + e) 

= C' r (t) + c + p(4/8 + 8 + 3e) + 4p 2 08 + 8 + e) + 2p 3 (j8 + 5 + c) 

<C J r (t) + a. 

For the left-hand inequality, we see that C j (t) - c - p8 - pi - 2p(t - a) < C ! + 1 Jt), where 
a = c' (ARR' (q)). The factor t - a is bounded exactly as before, so that we obtain: 
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C q (t)-a£C , + 1 p (t). I 



4.7 Agreement and Validity Conditions 

We are now ready to show that the agreement and validity properties hold. The main effort is in 
restating bounds proved earlier concerning the closeness in real times when clocks reach the 
same value, in terms of the closeness of clock values at the same real time. 

4.7.1 Agreement 

The first lemma implies that the local times of two nonfaulty processes are close in those intervals 

where both use a clock with the same index. 

Lemma 4-16: Let p, q be nonfaulty. Then 

iCpW-C^QISO + p)(fi + 2p(1 + 9 ){fi + S + €)) 

for max{u M p ,u M q } < t < max{u i pl u i q }, if i £ 1 , 



Proof: Basis: i = 0. Lemma 4-14 implies thai 



and for min{t° p ,t° q } < t < max{u° p ,u° q }, if i - 0. 



|c' p (T)-c' q (T)l <P + 2p(1 + p)05 + 8 + e) 

for all T, U M < T < U 1 if i > 1 and for all T, T° < T < U° if i ■ 0. Then Lemma 4-3 
immediately implies the needed result for i ■ 0. 

Induction: i > 1 . Lemma 4-3 implies the result for all t with 

min{c' p (U M ), c' q (U M )} £ t < max{u' p , u' q }. 

It remains to show the bound for t with 



H ..1-1 \^*/~..~rJ nit-1\ J iiiMi 

i /iil-1\ ^ J «ii-1i 



max{u M ,u M q } < t < min{c' p (ir ), c^lT 1 )} 



Without loss of generality, assume that c' (Lr 1 ) < c'^U 1 ' 1 ), so that the minimum is 
equal to c p <U'" 1 ). 

|C j p (t) - C' q (t)| < |(C j p (t) - C' q (t)) - (C^pflj")) - ^(c'^U 1 - 1 )))! 



+ |C i p (c i p (U M ))-C i q (c i p (U M ))| 



• /I |H» *\ ei»»>tSmavr<i^ ||H 



The first term, by Lemma 4-2, is at most 2p(c* (U") - 1). Since t £ max{u p , u' q } > 

u i " 1 n ^c M JU M ),wehave 
p p 

2p(c i p (U M )-t) < 2p(c i p (U i - 1 )-c M p (U M )). 
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Since c M (U M ) = c 1 (T) for some T with |T - U M | < lADj'j, this quantity is 



<2p|c i p (U i - 1 )-c i p (T)l 

< 2p(1 + p)|U M - T|, by Lemma 4-1 

< 2p(1 + p)|ADj' p | 

<2p(l + p)((1 + p)03 + e) + pS), by Lemma 4-6. 

To bound the second term we note that Lemma 4-1 1 implies that 

|c j (U M )-c ! (U M )| < 0/2 + 2e + 2p(3/8 + 25 + 3e) + 4p 2 (0 + 5 + e) = a, 

and so Lemma 4-3, with T 1 « T 2 » U 1 ' 1 , implies that 



ICpfc'pttJ'- 1 )) - C^cyu 1 - 1 ))! < (1 + p)a. 

The assumed lower bound on /? gives the result that 

2p(1 + p)((1 + p){fi + e) + P«) + (1 + P)« < + P)ifi + 2p(1 + p)03 + fi + e)) ■ 

Here is the main result, bounding the error in the synchronization at any time. 
Theorem 4> 1 7: The algorithm guarantees Y-agreement, 

where y - j8 + e + p(7/3 + 35 + 7c) + 8p 2 (0 + 5 + e) + 4p 3 (j8 + 5 + e). 
Proof: The result for intervals in which the processes use clocks with the same indices 
has been covered in the preceding lemma. The expression in the statement of that 
lemma simplifies to 

p + p(3/3 + 25 + 2c) + 4p 2 ()3 + 6 + e) + 2p 3 (0 + 8 + e), 

which is less than y. 

Next, we must consider the case where one of the processes has changed to a new 
clock, while the other still retains the old clock. Consider |d + 1 p (t) - C' q (t)| for some t 
with u j < t < u' . Lemma 4-15 implies that there exist nonfaufty processes r and s 
such that 

C\M-a<d + \(t)<C a {t) + a, 

where a » e + p{4fi + 5 + 5c) + 4p 2 (0 + 5 + e) + 2p 3 (0 + S + e). 

IC'^pW-C^OI^a + max{|C i r (t)-C i q (t)|,|C i t (t)-C i q (t)|} 

< a + (1 + p) (/3 + 2p(1 + p){fi + 5 + c)), by the preceding lemma 

= + e + pfjp + 35 + 7e) + 8p 2 08 + 5 + e) + 4p 3 (0 + * + e), asneeded. I 
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In some applications, it may never be the case that clocks with different indices are compared, 
perhaps because use of the clocks for processing ceases during the interval in which confusion is 
possible. In that case, the closeness of synchronization achieved by Algorithm 4-1 is given by 
Lemma 4-16, and is approximately p + p(3/3 + 25 + 2e). This value is more than e less than the 
bound obtained when clocks with different indices must be compared. 

Now we can sketch why it is reasonable for p to be approximately 4c + 4pP, as mentioned at the 
end of Section 4.5.1. Assume P is fixed. The i-th clocks reach f within of each other. After the 
processes reset their clocks, the new clocks reach U j within p/2 + 2c (ignoring p terms). By the 
end of the round, the clocks reach T i+1 within about p/2 + 2c + 2pP of each other, because of 
drift. This quantity must be at most p. The inequality p/2 + 2c + 2pP < P yields P > 4c + 4pP. 

Suppose we alter the algorithm so that during each round, the processes exchange clock values 
k times instead of just once. Then we get p/2 k + (4 - 2 2k )e + 2pP < p, which simplifies to p £ 
4c + 2pP(2 k /(2 k -1)). It appears that/3 > 4c + 2pP is approachable. 

If the number of processes, n, increases while f, the number of faulty processes remained fixed, a 
greater closeness of synchronization can be achieved by modifying Algorithm 4-1 so that it 
computes the mean instead of the midpoint of the range of values. 

As in [1], we show that the convergence rate of algorithms that use the mean instead of the 
midpoint is roughly f/(n-2f). 

The result is based on the following lemma concerning multisets. 

Lemma 4-18: Let U, V, and W be multisets such that |U| « M » n £ 3f + 1 and |W| 
» n-f. lfd x (W,U) = d x (W,V) = 0,then 

|mean(reduce{U))-mean(reduce(V))|<diam(W)f/(n-2f) + 2x. 

The analysis of the modified Algorithm 4-1 parallels that just presented. However, the upper 
bound on P becomes 

P < /3(n-3f)/(n-2f)2p - tip - p(0 + 5 + c) - 2/5 - * - 2«. 

This bound implies p > 2(n-2f)(e + pP)/(n-3f), which approaches p > 2c + 2pP as n 
approaches infinity. 

We now demonstrate that this bound is reasonable. After updating the clock and then waiting 
until the clocks reach the next T ! , the clocks must still be within p, giving f/3/(n-2f) + 2c + 2pP ^ 



46 



P, which implies fi > (2e + 2pP)(n-2f)/(n-3f), which approaches 2e + 2pP as n approaches 
infinity. 

4.7.2 Validity 

Next, we show the validity condition. The first lemma bounds the values of the zero-index clocks. 

Lemma 4- 1 9: T° + (1 - p)(t - t° p ) < C° p (t) < T° + (1 + p)(t - t° p ) for t £ t° p . 
Proof: By Lemma 4-1. I 

The next lemma is the main one. 

Lemma 4-20: Let p be nonfaulty, i ^ 0. Then 

(1 -p)(t-tmax°) + T°-ic < d p (t) £ (1 + p)(t-tmin°) + T° + ie 

for all t ^ u i1 „ if i > 1, and for aJi t > t° if i « 0. 

Proof: We proceed by induction on i. When proving the result for i + 1, we will 
assume the result for i, for ail executions of the algorithm (rather than just the 
execution in question). 

Basis: i - 0. This case follows immediately by Lemma 4-19. 

Induction: Assume the result has been shown fori and show it for i + 1 . 

We argue the right-hand inequality first. The left-hand inequality is entirely analogous. 

Assume in contradiction that we have a particular execution in which C i+1 (t) > (1 + 
p)(t - tmin ) + T° + (i + 1)e for some t > u' . Then by the limitations on rates of 
clocks, it is clear that C i+1 p (u i p )>(1 + pHu^-tmin ) + T° + (i+l)e. 

Recall that p resets its clock at real time u' , by adding T' + 8 - AV 1 . In this case, the 
inductive hypothesis implies mat the adjustment must be an increment. 

By Lemma 4-5, this increment is <T' + fi-ARR 1 (q) for some nonfaulty q. Therefore, 

C l p (u'p) + T' + 5 - ARR' p (q) > (1 + p)(u' p - tmin ) + T° + (i + 1)e. 

Next, we claim that if p had done the adjustment just when the message arrived from q 
rather than waiting till real time u' , the bound would still have been exceeded. That is, 
ARR 1 (q) + f + S - ARR 1 (q) > (1 + p)(t* - tmin ) + T° + (i+1)e, where f » 
c' (ARR 1 (q)). (This again follows by the limits on the rates of clocks.) Thus, 

T 4 + 5>(1 + p)(t'-tmin°) + T° + (i + 1)e. 

Now consider an alternative execution of the algorithm in which everything is exactly 
like the one we have been describing, except that immediately after q sends out clock 
reading T', q's dock C' begins to move at rate 1. This change cannot affect p's 
(i+ 1)-st clock because q doesn't send any more messages until t* +1 , and these 
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messages aren't received until after the time when p sets its (i + 1)-st clock. 

By the lower bound on message delays, q's message to p took at least 5 - e time. Then 
at real time t' (defined above), we have C' q (f) > T j + 6 - e. But then C' q (t') > (1 + p)(f 
-tmin ) + T° + ie. 

But then the inductive hypothesis is violated, since t\ the time when p receives q's T 
message, is greater than or equal to u'*^, the time when q sets its round i dock. I 

Now, we can state the validity condition. Letqp = (P-(1 + p)(/3 + e)-p5)/(l + p). This is the 

size of the shortest round in real time since the amount of clock time elapsed during a round is at 

least P minus the maximum adjustment. 

Theorem 4-21 : The algorithm preserves (a 1 ,ct 2 ,a 3 )-validity, 

where o 1 ■ 1 - p - e/qp, <* 2 = 1 + p + e/<p, and ot 3 * e. 
Proof : We must show for all t > t° and all nonfaulty p that 

o^t-tmax ) + T°-a 3 < L p (t) < o 2 (t-tmin°) + T° + a 3 . 

We know from the preceding lemma that for i > 0, t ;> u'" 1 p (or t° p ), and nonfaulty p 

(1 - p)(t - tmax ) + T° - ie < C j p (t) < (1 + p)(t - tmin ) + T° + le. 

Since L (t) is equal to C 1 (t) for some i, we just need to convert i into an expression in 
terms oft, etc. An upper bound on i is 1 + (t-tmax )/? ■ Then 

(1 + p)(t-tmin°) + T° + ie <, (1 + p)(t-tmin°) + T° + (1 + (t - tmax°)/<p)e 

< (1 + p + e/?)(t - tmin ) + T° + c, since tmin <, tmax , 

and that 

(1 -p)(t-tmax°) + T°-i€ £ (1 - p)(t - tmax ) + T°-(1 + (t - tmax )/?)* 

> (1 - p - e/<p)(t - tmax ) + T° - e. 

The result follows. I 



4.8 Reintegrating a Repaired Process 

Our algorithm can be modified to allow a faulty process which has been repaired to synchronize 
its clock with the other nonfaulty processes. Let p be the process to be reintegrated into the 
system. During some round i, p will gather messages from the other processes and perform the 
same averaging procedure described previously to obtain a value for its correction variable such 
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that its clock becomes synchronized. Since p's clock is now synchronized, it will reach T i + 
within fi of every other nonfaulty process. At that point, p is no longer faulty and rejoins the main 
algorithm, sending out T' + 1 messages. 

We assume that p can awaken at an arbitrary time during an execution, perhaps during the middle 
of a round. It is necessary that p identify an appropriate round i at which it can obtain all the T' 
messages from nonfaulty processes. Since p might awaken during the middle of a round, p will 
orient itself by observing the arriving messages. More specifically, p seeks an i such that f T 1 " 
messages arrive within an interval of length at most (1 + p)(fi + 2c) as measured on its clock. 
There will always be such an i because all messages from nonfaulty processes for each round 
arrive within + 2e real time of each other, and thus within (1 + p)(/3 + 2e) clock time. At the 
same time as p is orienting itself, it is collecting T j messages, for ail j. 

Assuming that p itself is still counted as one of the faulty processes, at least one of the f arriving 
messages must be from a nonfaulty process. Thus, p knows that round i - 1 is in progress or has 
just ended, and that it should use T 1 messages to update its clock. 

Now p collects only T 1 messages. It must wait (1 + p)[fi + 2c + (1 + p)(P + (1 + p){fi + e) + 
pS), as measured on its clock, after receiving the f-th T 1 " 1 message in order to guarantee that it 
has received T' messages from all nonfaulty processes. The maximum amount of real time p must 
wait, (j3 + 2c + (1 + p)(P + (1 + p)05 + 2c) + p8), elapses if the f-th T M message is from a 
nonfaulty process q and it took 8 - c time to arrive, if q's round i - 1 lasts a long as possible, (1 + 
p)(P + (1 + p)09 + c) + p8) (because its clock is slow and it adds the maximum amount to its 
clock), and if there is a nonfaulty process r that is fi behind q in reaching T* and its T 1 message to 
p takes 8 + e. The process waits this maximum amount of time multiplied by (1 + p) to account 
for a fast clock. 

(Some extra bookkeeping in the algorithm is necessitated by the fact that V messages from 
nonfaulty processes can arrive at p before p has received the f-th T M message. This scenario 
shows why: Suppose p receives the first T i1 message at real time a, it is from a nonfaulty process 
q, and its delay is 8 + c, and that the f-th T*" 1 message is received ft + 2c after the first one. Also 
suppose that q's round i - 1 is as short as possible in real time, P - (1 + p)ifi + c) - p8) / (1 + p), 
that there is a nonfaulty process r that begins round i p before q does, and that r's T 1 message to p 
arrives at real time b and has delay 8 - c . 

We show that b < a + fi + 2e, implying that the T 1 message is received before the f-th V 
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message. 



b = t' r + S-e 



t' q -0 + «-e 



= t M q + (P-(1 + p)05 + e)-pfi)/(1 + p)-p + 5-e 

>t M + ((1 + p)(3j8 + 3c) + p8 - (1 + p)08 + c) - p8) / (1 + p) -ft + « - «, by lower bound on P 

= t*' 1 q + + S + e 

= a-S-e + /8 + S + e. 

Thus, b > a + /*. However, if P is very close to the lower bound, then b is approximately a + /*, 
which is less than a + + 2e.) 

Immediately after p determines it has waited long enough,, it carries out the averaging procedure 
and determines a value for its correction variable. 

We claim that p reaches V* 1 on its new clock within/* of every other nonfaulty process. First, 
observe that it does not matter that p's clock begins initially unsynchronized with all the other 
clocks; the arbitrary clock will be compensated for in the subtraction of the average arrival time. 
Second, observe that it does not matter that p is not sending out a T 1 message; p is being counted 
as one of the faulty processes, which could always fail to send a message. (Processes do not 
treat themselves specially in our algorithm, so it does not matter that p fails to receive a message 
from itself.) Finally, observe that it does not matter that p adjusts its correction variable whenever 
it is ready (rather than at the time specified for correct processes in the ordinary algorithm). The 
adjustment is only the addition of a constant, so the (additive) effect of the change is the same in 
either case. 

We want to ensure that when a process that is reintegrating itself into the system finishes 
collecting T' messages and updates its clock, mis new clock hasn't already passed T i+1 . The 
reason for ensuring this is that the process is supposed to be nonfaulty by T 1 + 1 and send out its 
clock value at that time. 

The code is in Figure 4-2. 

INFO is an array, each entry of which is a set of (process name, clock time) pairs. When a T' 
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beginstep(u) 
do forever 

if u = (T 1 ,q) and (q,T) C INF0[i] for any T then 
INF0[i] := INF0[i] U {(q.NOW)} 
if |{(Q.T) € INFO[i]: q is any process and 
T > NOW - (1 + p)(/5 + 2c)} | * f 
then exit endif 
endif 
endstep 
beginstep(u) 
enddo 

/• p knows it should use round i values */ 

do for each (q.T) € INF0[i] 

ARR[q] :» T 

enddo 
set-timer(NOW + (1 + p)(p + 2c + (1 + p)(P + (1 + p)(P + e) + p8))) 
endstep 

beginstep(u) 

while u * (T'.q) for the chosen 1 do 

ARR[q] :* NOW 

endstep 

beginstep(u) 

endwhile 

/* fall out of loop when timer goes off •/ 

AV := mid(reduce(ARR)) 
ADJ :» T 1 + 5 - AV 
CORR := CORR + ADJ 
set-timer(T 1 + P) 
endstep 

/• switch to Algorithm 4-1 */ 

Figure 4- 2: Algorithm 4-2, Reintegrating a Repaired Process 



message arrives from process q, p checks that q hasn't already sent It a T' message. If not, then 
q's name and the receiving time are added to the set of senders of T\ INFOfj]. If f distinct T* 
messages have been received within the last (1 + pXjS + 2e) time, then p knows that it should 
use T" messages to update its clock. 

The current lower bound on P, the round length, is not large enough to ensure that when the 
reintegrating process finishes collecting T j messages and updates its clock, this new clock hasn't 
already passed f +1 . 

There are two ways to solve this problem: 
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1 . make the minimum P approximately three times as large as it currently must be; 

2. have the process send out its clock value at T 1 +2 . It can be collecting T 1 + 1 messages 
all along, but now it knows a tighter bound on when to stop collecting them (since its 
(i + 1)-st clock is synchronized with the other nonfaulty processes' clocks). This will 
work as long as the time at which it stops collecting V messages isn't after the 
process' (i + 2)-nd clock has reached T** 2 . 

Now we show that P must be about three times as large as the previous lower bound in order to 
prevent the reintegrating process from waiting too long before updating its clock. The actual 
criterion we use is that the process must update its clock at least before any other nonfaulty 
process" (i + l)st clock reaches V + \ (Since the process' new clock is synchronized with those 
of the nonfaulty processes, it will not reach T i+1 more than p before any other nonfaulty clock 
does.) 

Let p be a process being reintegrated during round i and let t be the real time when p stops 
collecting T* messages 

Lemma 4-22: If t < c' * 1 (T 1 + 1 ) - fi for any nonfaulty process q, then 

P> (6/3 + 8 + 9c + p(8fi + 3fi + 16e) + p 2 (60 + fi + 14e) + p 3 (40 + 35 + 8e) 

+ p A {fi + 8 + 2e))/(1 -5p-3p 2 -p 3 ). 
Proof: The worst case occurs if p waits as long as possible to finish collecting T 
messages and another nonfaulty process q reaches T 1 + 1 as soon as possible. 

Suppose p receives the first T M message at real time t', and the f-th T 4 " 1 message at t* 
+ (1 + pfifi + 2e) (because its clock is slow). According to the reintegration 
algorithm, p will then waft (1 + p)(jB + 2e + (1 + p)(P + (1 + p)ifi + 2e) + p«))onfts 
clock, which means it will wait (1 + p) times as long In real time. 

Thus,t » f + (1 + p) 2 (20 + 4e + (1 + p)(P + (1 + p){fi + 2e) + p8)). 

Now assume that the first T M message received by p was from a nonfaulty process q 
and that it took 8 + e time to arrive. Thus c i1 q fr' 1 ) - t' - 8 - e. If round i - 1 and 
round i both take the shortest amount of real time, (1 - p)(P - + p)ifi + e) - p8), 
then 

c i+1 q (T i + 1 ) » c^rr 1 " 1 ) + 2(1 -p)(P-0 + p\ifi + €)-p«). 
We want to ensure that c' + 1 q (T ! + 1 ) - 1 £ J8, i.e., 

f-fi-e + 2(1-p)(P-(1 + p){fi + e)-p8) 

-t'-(1 + p) 2 (2i8 + 4e + (1 + p)(P + (1 + p)[fi + 2e) + pS))£0. 

This inequality simplifies to the stated bound. I 
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This new lower bound on P is about three times the size of the previous one, which was 
P>2/3 + fi + 2e + 2p[fi + 5 + c). 

If increasing the lower bound on P is unacceptable, the second solution can be employed. Its 
drawback is that now it will take longer for a process to be reintegrated. A similar argument to the 
above shows that in order to guarantee that p finishes collecting T - ' messages at least p before 
any nonfaulty process reaches T 1 + 2 , we must have 

P > (50 + 8 + 10c + 2p(5j8 + 25 + 9e)) / (2 • 4p), ignoring p 2 terms. 

This lower bound is fairly close to the original one. For absolute certainty that the original lower 
bound will suffice, the process can wait until T"* 3 . 



52 



Chapter Five 
Establishing Synchronization 



5.1 Introduction 

In this chapter we present an algorithm to synchronize clocks in a distributed system of 
processes, assuming the clocks initially have arbitrary values. The algorithm handles arbitrary 
failures of the processes and clock drift. We envision the processes running this algorithm until 
the desired degree of synchronization is obtained, and then switching to the maintenance 
algorithm described in the previous chapter. 



5.2 The Algorithm 

5.2.1 General Description 

The structure of the start-up algorithm is similar to that of the algorithm which maintains 
synchronization. It runs in rounds. During each round, the processes exchange clock values and 
use the same fault-tolerant averaging function as before to calculate the corrections to their 
clocks. However, each round contains an additional phase, in which the processes exchange 
messages to decide that they are ready to begin the next round. This method of beginning rounds 
stands in contrast to that used by the maintenance algorithm, in which rounds begin when local 
clocks reach particular values. A more detailed description follows. 

Nonfaulty processes will begin each round within real time 8 + 3c of each other. Each nonfaulty 
process begins the algorithm, and its round 0, as soon as ft first receives a message. (It will be 
shown that this must be within 8 + 3c.) At the beginning of each round, each nonfaulty process p 
broadcasts its local time. Then p waits a certain length of time guaranteed to be long enough for 
it to receive a similar message from each nonfaulty process. At the end of this waiting interval, p 
calculates the adjustment it will make to its clock at the current round, but does not make the 
adjustment yet. 

Then p waits a second interval of time before sending out additional messages, to make sure that 
these new messages are not received before the other nonfaulty processes have reached the end 
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of their first waiting intervals. At the end of its second waiting interval, p broadcasts a READY 
message indicating that it is ready to begin the next round. However, if p receives f + 1 READY 
messages during its second waiting interval, it terminates its second interval early, and goes 
ahead and broadcasts READY. As soon as p receives n - f READY messages, it updates the 
clock according to the adjustment calculated earlier, and begins its next round by broadcasting 
its new clock value. (This algorithm uses some ideas from [3].) 

A process need only keep clock differences for one round at a time. The waiting intervals are 
designed so that during round i a nonfaulty process p will not receive a READY message from 
another nonfaulty process until p has finished collecting round i clock values. Round i + 1 clock 
values are not broadcast until after READY is broadcast, so p will certainly not receive round i + 1 
clock values until after it has finished collecting round i clock values. However, round i + 1 clock 
values might arrive during the second waiting interval and while the process is collecting READY 
messages. As a result, the adjustment is calculated at the end of the first waiting interval and the 
difference for any round i + 1 dock value received during round i is decremented by the amount 
of the adjustment. 

5.2.2 Code for an Arbitrary Process 

Global constants: 8, t, p, n, f: as usual. 

Local variables (all initially arbitrary): 

• T: clock time at which current round began. 

• U: clock time at which the first waiting period is to end. 

• V: clock time at which the second waiting period is to end. 

• DIFF: array of clock differences between other processes and this one for current 
round. 

• SENT-READY: set of processes from whom READY messages have been received in 
current round. 

• CORR: correction variable. 

• A: adjustment to clock. 

The code is in Figure 5- 1 . 
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beginstep(w) 

do forever /• each iteration is a round */ 

T :« NOW 

broadcast(T) 

U :« T + (1 + p)(25 + 4«) 

set-timer(U) 

/• first waiting interval: collect clock values */ 

while ~(w > TIMER & NOW - U) do 

if w » (ra.q) then 0IFF[q] :» m + 8 - MOW endif 

endstep 

beginstep(w) 

endwhile 

/• end of first waiting interval •/ 

A :» mid(reduce(DIFF)) 

V :- U + (1 + p)(4« + 4p(« + It) + 2p z (« + 2e)) 

set-timer(V) 

SENT-READY :■ 

/• second waiting interval: collect READY messages and clock values 
for next round */ 

while ~(w * TIMER & NOW - V) do 
if w - (READY, q) then 

SENT-READY :« SENT-READY U {q} 
if | SENT-READY j « f ♦ 1 then exit endif 
elseif w ■ (m.q) then DIFF[q] :■ m + 8 - NOW endif 
endstep 
beginstep(w) 
endwhile 

./• end of second waiting interval due to timer or f ♦ 1 READY messages •/ 

broadcast (READY) 

endstep 
beginstep(w) 

/• collect n - f READY messages and next round clock values •/ 

while true do 

if w - ( READY, q) then 

SENT-READY :« SENT-READY U {q} 
if |SENT-REAOY| » n - f then exit endif 
elseif w - (m.q) then DIFF[q] :• m ♦ 8 - NOW endif 
endstep 

beginstep(w) . - 
endwhile 

/* update clock and begin next round */ 

DIFF :■ DIFF - A 
CORR :« CORK + A 
endstep 
beginstep(w) 
enddo 

~ Figure 5-1 : Algorithm 5-1, Establishing Synchronization 
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5.3 Analysis 
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rdy , Q (p)£v' p + o , -t 

£v' r +8-e 

>t' r ♦ (25 ♦ 4c) + (4c ♦ 4p(5 .♦ 2c)) + 5-c,by deflation ofv> f and the upper 
bound on the drift rale 

-t 1 + 38 + 7e + 4p8 ■► 8pc, 

and 

U^r+tV^ + K-^ 

<t i ♦ flf -t"j + (1 + p) 2 (28 + 4c), by definition of u ! q and the lower bound on the 
drift rats 

, j + (t 1 -fy + 28 + 4c + 4p8 + 8pe. 

Thus,t i r ^u l q ^t i q -t , p-28-4*-4p«-8pc,iriH)tyiofl 

rdyi (p) ^u i q -(t i q -t i r )-2«-4c-4pfi-8pc + 38 + 7e + 4p8 + 8pc 

-u' -(t 1 -^ + 8 + 3*. I 

Lemma 5-2: For any nonfaulty processea p and q and any i £ a 

(8)1^-^8 + 3c,and i 

(tyrdy^tp)^. 

Proof: We proceed by induction on i. 

Basis: i ■ 0. 

(a)h° -t |*8 + c,becaueeaasoonaspwakeaup,itsen^ 
Lio^Woc^. Thereceipt of thia rn«*^ wWch occu« at mo* 8 stater, 
causea q to begin round a if « haan't already done ao. 

(b)l^rbetr*fir*r*mfaultypro<^^ ByLemmaM, 

rdy^uVflV* ,)* 8 * 31 
> u° -(8 + «) + 8 + 3c, by part(a) 

. >'* 

Induction: Assume for i - 1 and ahow for I. 

i.i i -«h-th« first nonfautty process to begin round i. Then a receivea n - f READY 
(a) Let a be trw fir* rwnia^ pn^ » 

messages during ita round i- l^after u J. ^~"~f \T"!L, lMv nmcMaea 
p^esbypart(b)oftr«irdiictlonhypotnj^ Theee n - 21 nonfautty proceaeaa 
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also send READY messages to all the other processes. By f ♦ £^ ncmWJr 
process receives at least n - 2f > f + 1 READY rnessagesami broadcasts READY. 
Thus q receives n-f READY messages by f f + 2e + 8 + e. Thus, 

r 1 <t» +a + 3e 
q — » 

< ^ + 8 + 3e, by choiceof s, 
— p 

which implies ^-t'p < 8 + 3e. 

By reversing the roles of p and q in the above argument we obtain ^-^ ^6 ♦ 3e. 

(b) Let r be the first nonfaulty process to send READY at round i. By Lemma 5-1 . 

>u j -(8 + 3«) + 8 + 3«,bypart(a) 

Next we show that a process waits a sufficient length of time to receive clock values from all 
nonfaulty processes before beginning the second waiting interval in a round. 
Lemma 5-3: Let p and q be nonfaulty, and i > 0. Then arfyq) £ u p . 
Proof: By the lower bound on thedrift rate, u' p £ t> p ♦ 28 1* 4a. *^ " ""J^ 
that q sends its round! clock vaiue by t> p ♦ 8 I 3« : Thus arr>) < t> p + 28 + 4t £ 

The next two lemmas bound how long a round can last for one process. F.rst we bound how long 
a process must wait after sending READY to receive n - f READY messages. 

Lemma 5-4: For p nonfaulty and I £ 0, t»* 1 p - v> p < 28 ♦ 4« + 4p<8 + 4a). 
Proof: The worst case occurs if p is as far ahead of the other j™^^,^^ 
possible, its dock is fast, the other clocks are slow, and the low M^-J- "JJJ 
merges take as long as possible to arrive. However, as soon as they arrive, pbegma 
thenextround. Let q be one of the slow nonfaulty i 



£(8 ♦ e) + (1 ♦ P) 2 (4e + 4p<8 + 2e)) ♦ (1 ♦ f>f& + «•> + < 8 + *> 
-(4e + 4p(8 + 2e))-(28 + 4e) 

- 28 + 4e + 4p<8 + 4e), ignoring p 2 terms. I 
Lemma 5-5: For any nonfaulty process p and any i £ 0, 

t* +1 -t 1 <48 + 12e + 4p(38 + 10e). 
Proof:/^-^ '^V^ * ^P"^ + (U P-^ 
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<2S + 4c + 4p(5 ♦ 4c) + (v^-u^ + (u^-ty by Lemma 5-4 

< 25 + 4c + 4p(5 ♦ 4c) + (1 + p) 2 (4e ♦ 4p<5 + 2e)) + (1 + p) 2 ^ ♦ 4c) 
« 45 + 12e + 4p(35 + 10c). I 

Now we give an upper bound on how far apart tmax' and tmax' * 1 can be. 
Lemma 5-6: Foranyi^O, 

tmax'^-tmax* <45 + 12e + 4p(35 + 10c). 

Proof: Let p be the nonfautty process such that t^^ - tmax* . Then 

tmax'^-tmax' - t i+1 p -tmax } <t i+1 p -t i p 

< 45 + 12e + 4p(35 + 10c), by Lemma 5-5. I 

Lemma 5-7 bounds the amount of real time between the time a nonfautty process receives a 
round i message from another nonfautty process and the time the test nonfautty process begins 

round i + 1. . 

Lemma 5-7: For any i > and nonfautty processes p and q, 

tmaxi + 1 _ arr 1 (q) < 55 + 19e + 4p(35 + 10c ). 

Proof: tmax l+1 -arr' p (q) - (tmax i+1 -^V ♦ 4+\-*J * ^-^-(ar»« p «fl)-^ 

< (5 + 3e) + (45 + 12c + 4p(35 + 10c)) + (5 ♦ 3s) - (5 - e), by Lemmas 5-2 and 
" 5-5 and the lower bound on the message delay 

- 55 + 19e + 4p(35 + 10c). I 

The next lemma bounds the error in a nonfautty process' estimate of another nonfautty process 1 

local time at a particular real time. 

Lemma 5-8: Let p and r be nonfautty. Then 

IDIFF* (r) + C' p (tmax ,+ Vd r (tmax ,+1 )| £ c + P (115 + 39e). 
Proof! IDIFF'jr) ♦ C i p (tmax i * 1 )-C , r (tmax l+1 H 

- ryAL' p (r) + 5-ARR" p (r) + C i p (tmax l+1 )-C l r (tmax ,+1 )|. 

If the quantity in the absolute value signs is negative, then tr* expression is equal to 

C , r (tmax' +1 )-C i p (tmax , * t ) ♦ d^WJ-i-VAL^r) 

^d r (tmax i * 1 )-C i p (tmax i+1 ) ♦ djmt^-*-<*&Jfi*^ *"»****** 
most 5 + e 
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^'(tmaxi* VCVtmax'* 1 ) + C i p (arr' p (r))-8-C' r (<(r)) ♦ (1 + p)<« + •>. since 
the clock drift is at most 1 + p 

<2p(tmax i + 1 -arr i p (r)) + e + p« + p«, by Lemma 4-2 
< 2p(5fi + 19e) + « + P« + P«. by Lemma 5-7 
« e + p(11« + 39e). 

I, the quantity in the absolute value signs is positive, aamilar argument shows that 
IDIFF'J) + d p (tmax'* VC>ax'* 1 )| < e ♦ pfl1« * 37e). I 

The next lemma bounds how far apart two processes" Hh clocks are at the time when the tot 
process begins round i ♦ 1. The bound is in terms of how far apart the docks are when the test 

process begins round I. 

Lemma 5-9: For any nonfaufty p and q. and any i, 

|d (tmax i+1 )-C i q (tmax i+1 )|<B i + 8p(« + 3t). 
Proof: |d p (tmax i+ VC>nax i+1 >l 

^ICVtmaxV^tmax^l ♦ tf^«)-<t^^-&^'<*J>°~t* 

< B j + 2p(tmax l+1 -tmax 1 ), by definition of B 1 and Lemma4.2 

< B 1 + 2p(4« + 12e), by Lemma 5-6 and ignoring p term* 

- B' + 8p(5 + 3e). I 

Now we can state the main result, bounding B 1 + 1 in terms of B 1 . 
Theorem 5-10: B i+1 £ V4B 1 + 2e + 2p(11« + 39«). 
Proof.B 1 * 1 - m«{|d* 1 p (tmax l+1 )-C! + V t ^ ,+1 ^^ non,aU,typandq • 

Letx - e + p(11« + 39«). 

We now define three multisets U, V, and W that sattefy tr* hypotheses of Lemma A-4. 

Let 

U - DlrTp + dpttmax 1 * 1 ), 

V - DIFF^ + C^tmax 1 * 1 ), and 

W - {(fytmax 1 * 1 ): r is nonfautty}. 

U and V have size n; W has size n - f. 
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Define an injection from W to U as follows. Map each element C* r in W to DIFfyr) ♦ 

C' (tmax' * 1 ) in U. Since Lemma 5-8 implies mat 
P 

iDIFF'pCr) + C i p (tmax i+1 )-C i r (tmax ,+t H^x 

for all the n - f nonfaulty processes, d x (W,U) - 0. Similarly, d x (W, V) > a 

By Lemma 5-9, diam(W) < B ! + 8p(8 + 3«). Thus, Lemma A-4 implies 

|mid(reduce<U)) - mkJ(reduce(V))l < ttdiamO/V) + 2x 

. V4B 1 + 2e + 2p(11« + 39e). 

Since mid(reduce(U)) - mkH.reduce(DIFF , p + cytmax 1 * 1 ))) 

- mid(reduce(DIFF i p )) + C* p (tmax i+1 ) 

- ADJ' + djtmax 1 * 1 ) 

p p 

. C 1 - 1 (tmax 1 * 1 ) 
p 

and similarly mid<reduce<V)) - C* +1 q (tmax i+1 ), the result follows. 1 

We obtain an approximate bound on how closely this algorithm wilt synchronize the clocks by 
considering the limit of B 1 as the round number increases without bound. 

Theorem 5-11: This algorithm can synchronize clocks to wttWn 4e + 4p(11« + 39«). 

Proof : llm^oo^ 

- Hm^ooIB /^ + (1 + 1/2 + ... + 1/2 j - 1 )(2e ♦ 2p(11« ♦ 39*))] 
. 4* + 4p(11« + 39*), since the limit of the geometric series is 2. I 

As was the case for Algorithm 4-1 . if the number of processes, n, Increases while f, the number of 
faulty processes remained fixed, a greater closeness of synchronization can be achieved by 
modifying Algorithm 5-1 so that it computes the mean iwrtead of tr» riiidpomt d the range of 
values, which approaches 2* + 2pP as n approaches infinity. 

After modifying Algorithm 5-1 , we get 

B" £ B h1 f/(n-2f) + 2* + 2p<11« + 39*)- 

This is the same as 

B» < B°f/(rv-2f) + (1 - (f/(n-2f))V{1 - f/(n-2f)X2* + 2p<1 1« + 39*), 
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which approaches 2c ♦ 2p(11« + 39e) as n approaches infinity. 

5.4 Determining the Number of Rounds 

The nonfaulty processes must determine how many rounds of this algorithm must be run to 
estabiish the desired degree of synchronization before switching to the maintenance algonthm. 
The basic idea is for each nonfaulty process p to estimate B°, and then calculate a suffiaent 
number of rounds, NROUNDS . using the known rate of convergence. B° is estimated by havmg 
p calculate an overestimate and an underestimate for C° q <*nax°) for each q, and letting the 
estimated B° be the difference between the maximum overestimate and the mmimum 
underestimate. 

Let p's overestimate for C° q (tmax°) be OVER p (q) and p's underestimate for C° q (tmax°) be 
UNDER p (q). 

For the overestimate, we assume that q's dock is fast, and that the maximum amount of time 
elapses between t° q (when q sent the message) and tmax . That maximum is a + « smce every 
nonfaultyprocessbegir«rwridOass Tnu8 ' 

OVER p (q) - VAL° p (q) + (1 + pM* + «)• 

Similarly, we can derive the underestimate. We assume that q is the test nonfaulty process to 

begin round 0. Thus, 

UNDER p (q) - VAL° p (q). 

Process p computes its estimate of Br , 

B° p - max q {OVER p (q)}-min q {UNOER p (q)}. 

Now p estimates how many rounds are needed until the spread is dose enough. There is a 
predetermined 7 * * ♦ 4p<11* ♦ 39*), which is the desired dommess of synchronization for 
the stert-up algorithm. After I rounds, 

B> £ B° /2» + (1 + 1/2 + ... + 1/2*- 1 )(2t + 2p(11« + 3*)). 

Process p sets the right hand side equal to T and solves fori to obtain Hsestiniate of the reo^^ 

number of rounds, NROUNOSp. 
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Now each process executes a Byzantfne Agreement protocol on the vector of NROUNDS values 
one value for each process. The processes are guaranteed to have the same vector at the end of 
the Byzantine Agreement protocol. Each process chooses the (f ♦ U-st smallest etement of the 
resu.tingvectorastherequirednumberof rounds. The smallest number of rounds computed by a 
nonfaulty process will suffice to achieve the desired closeness of synchronization. Variations «. 
the number of rounds computed by different nonfaulty processes are due to spurious values 
introduced by fau.ty processes and to different message delays. However, the range computed 
by any nonfaulty process is guaranteed to include the actual values of all nonfaulty processes at 
tmax° so the range determined by the process that computes the smallest number of rounds also 
includes all the actual values. In order to guarantee that each process chooses a number of 
rounds that is at least as large as the smallest one computed by a nonfaulty process, it chooses 
the (f + 1 )-st smallest element of the vector of values. 

Any Byzantine Agreement protocol requires at least f ♦ 1 rounds. The processes can execute 
this algorithm in parallel with the clock synchronization algorithm, beginning at round 0. The 
dock synchronization algorithm imposes a round structure on the processes' communion* 
The Byzantine Agreement algorithm can be executed using this round structure. Each BA 
message can also include information heeded for the dock synchronization algorithm (namely, 
the current clock value). However, the processes will always need to do at least f ♦ 2rounds,one 
to obtain the estimated number of rounds and f + 1 for the Byzantine Agreement aJgonthm. 

5.5 Switching to the Maintenance Algorithm 

After the processes have done the required number of round, (denoted by r throughout this 
section) of the start-up algorithm, they cease executing It The process*, should begin the 
.maintenance algorithm as soon as possible* ending the start-up algorithm in order to 
minimize the inaccuracy introduced by the ckx* drift 

in the maintenance algorithm each process broadcasts its clock value when its clock reaches T< 
fori -0 1 w heref* 1 -T i + P.LetT°bearmiWple^^ 

that the first'muftiple of P reached by nonfaulty p'a <^ after IW^ the quired r rcumd. 
differs by at most one from the first muWple reached by nonfat q's cl«* after me r rounds. 
When a process reaches the first multiple of P after it has ended the start-up algorithm, It 
broadcasts hs clock value as m tf* maintenance alg^tr^, but d«HW't upda^ Atthe 

next multiple of P, the process begins the M m*!.^ *o^ * b«*^ * <^ 
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value and updating its clock. (It will receive clock values from all nonfaulty processes.) 

The analysis introduces a new quantity, fi v representing an upper bound on the closeness of me 
nonfaulty processes' clocks at tmax'. That is, for any nonfaulty processes p and q, |C p (tmax ) - 
C r (tmax r )| < fi v We show that if the following five inequalities are satisfied by the parameters, 
then the switch from the start-up algorithm to the maintenance algorithm (with parameter 0) can 
be accomplished. 

(1)j3 1 >4c + 4p(11« + 39e) 

(2)0 > (^ + 26 + p(6P-j8 1 + 25 + 12e)) / (1 -8p) 

(3)P>2(1 + p)(fi + e) + (1 + p)max{8,P + e} + p* 

{4)P <pMp-t/p-p{fi + 5 + e)-2fi-i-2t 

(5)0 > 4« + 4p(3/J + S + 3e) + 8p 2 (/3 + 5 + e) 

The first inequality is imposed by the limitation on how closely the start-up algorithm can 
synchronize. The second inequality reflects the inaccuracy introduced during the switch. The 
last three are simply repeated from Section 4.5.1 . 

First we show that 0, can be attained by the start-up algorithm. 
Lemma 5-12: There exists an integer i such that B 1 £ p v 

Proof: Since p\ must be larger than 4e ♦ 4p(11« + 39e), the result Wows from 
Theorem 5-11, which states that the closeness of synchronization approaches At * 
4p(1 1 8 + 39e) as the round number, i, increases. I 

Note that the number of rounds, r. that the processes agree on is > i, and that the worst-case tf is 
no more than the worst-case B j , which is at most fi v 

Lemma 5-13 shows that the first multiple of P reached by a nonfaulty process after finishing the 
start-up algorithm differs by at most one from that reached by another nonfaulty process. 
Lemma 5-13: Letpandqbe nonfaulty processes. Then 

Proof: Vfjfj-ejfJlZPjfJ + < 1 + PNVg-<Wl 
£|C (O-C (fyl + (1 + p)(« + 3e), by Lemma 5-2 
<l(C r / p )-C r q (tmax r ))-(C r p (t r p )-C r p (tmax r ))l + |C q (tmax r ) - ^(tma/H 
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+ (1 + p)(fi + 3e) 

<2p(tmax T -f) + j8, + (1 + p)(5 + 3e), by Lemma 4-2 and definition of p A 

< 2p(S + 3e) + 0, + (1 + p)(5 + 3c), by Lemma 5-2 

» j8 1 + (1 + 3p)<5 + 3«). 

Suppose in contradiction that P < 0, + (1 + 3p){5 + 3e). By solving inequality (2) for 
^.weget 

^ 1 <0S-2e-p(8/8 + 25 + 12e + 6P))/(1-p), 

which implies that 

P<(0-2e-p(8/J + 2fi + 12e + 6P))/(1-p) + (1 + 3pM« + 3e). 

This simplifies to P<(0 + & + e-8p/J + pfi-3p«)/(1 + 5p). 

Combining this with inequality (3) yields 

2(1 + p)(0 + e} + (1 + p)S + p«<P<(j8 + a + e-Sp/8 + p«-3pe)/(1 + 5p). 

Solving for fi gives £<-{« + 6p£ + 15pe)/(1 + 20p), which is a contradiction. I 

The rest of the section is devoted to showing that the difference in real times when nonfauity > 
processes' clocks reach the first multiple of P at which they will all perform the maintenance 
algorithm is less than or equal to /?. Consequently, this £ can be preserved by the maintenance 
algorithm. 

Define kP to be the first multiple of P reached by any nonfauity process' r-th clock. The first 
multiple of P reached by any other nonfauity process is efflier kP or 0i+ 1)P, by Lemma 5-13. At 
(k+ 1)P some of the nonfauity processes wiH actually update their clocks, and at (k + 2)P all of 
them will update their clocks. 

Recall that (k+1)P - T k * 1 andU k * 1 - T** 1 + (1 + p)[fi + S + e). Letu k+1 p • c r p (U k+1 )and 
simiiartyforq. 

Lets and t be two nonfauity processes. Here is a description of the worst case: 

• s has tiie smallest clock value at tmax r , barely above (k-t)P, and it» clock is stow. 

• fs clock is fast and is0 t ahead of s's at torn/. 

• s updates its clock at U k+1 , by decrementing it as much as possible. 
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• t updates its clock at U* + \ by incrementing it as much as possible. 

first we must bound how far apart in real time nonfaulty processes' r-th clocks reach U k+ . 
Lemma 5- 14: Let p and q be nonfaulty processes. Then 

|c r p (U k+1 )-c r q (U k * 1 )| £ (1 -p)/3 1 + 2p(2P + fi * 6 + e). 

Proof: Without loss of generality, suppose c r p (U k * 1 ) £ c^U" + 1 ). Then 

|c r p (U k+1 )-c r q (U k+1 )| - c r p (U k * 1 )-c r q (U k+1 ) 

= (c r p (U k + 1 ) - tmax r ) - (c r q (U k + 1 ) - tmax r ) 

<(C r (u k+1 )-C' p (tmax'))(1 + pj-^y^^-^tmax^MI -p). by the bounds on 



p p 
the drift rate 



<(2P + (1 + p){fi + * + e))0 + PM 2P + < 1 + I'M + 8 + «)-M 1 -^ 
-(1-P^t + 2p(2P + + 8 + e). I 

Next, we bound the additional spread introduced by the resetting of the clocks. 

Lemma 5- 15: Let s and t be the nonfaulty processes described above. Then 

(a)c f+1 s (U k * 1 )-c r 5 (U k+1 ) £0 + p)(e + Pffl + * + 5 «)' and 

(b) c r t (U k+1 ) -c r+ \(U k+1 ) <, (1 + p)(e + p<40 + « + 5t). 

Proof: (a) By Lemma 4-15, weknowmats'snew dock isatmosta -■£$«* + J* 
5e) less than the -smallest" of the previous nonfaulty docks at c (U ) - u ^ 
SncHhad the smallest dock before. C*> k * 1 ,) 2> C,(u k+ V-«- By the lower 
bound on the drift rate, 

c r * 1 t (U k+1 )-c r ,(U k+1 )£(1 + p)a. 

(b) Lemma 4-15 also states that fa new dock is at most a more than the "largest" of 
the previous nonfaulty docks at u k *\. which was fs clock. The argument Is similar to 
(a). I 

finally, we can bound the maximum difference in real time between two nonfaulty processes' 
docks reaching T**». Let i p be the Index of p's logical dock that is in effect when T*+ 2 is 

reached. 

Theorem 5-1 6: Let pandq be nonfaulty processes and! - l p andj « i q . Then 

Proof: Without loss of generality, suppose cf p 0** 1 )2t^(r + ^. Then 
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<c r *\{T k * 2 )-c T ^\^ 2 ) 

for nonfaulty processes s and t that behave as described above. 
We know from Lemma 4-2 that 
ic r+ \<J k + 2 )-c r +\(T k + 2 ))-{c r +\(tf+'')-c T +\[U k +' t )) 
<2p(P-(1 +p)05 + « + e)). 
Thusc'* 1 ,^ 2 )-^ 1 ,^ 2 ) 

<2p(P-(1 + p)(0 + 5 + e)) + c^u"* 1 )-^^* 1 ) 
= 2p(P-(1 + p)(fi + 6 + e)) + c r+1 8 (U k+1 )-c r ,(U k+1 ) + c r t (U k+1 )-c r+1 t (U k+1 ) 
+ c r 8 (U k+1 )-c r t (U k+1 ) 

< 2p(P-(1 + p){fi + 5 + e)) + 2(1 + p)(e + p(4/8 + « + 5t)) 
+ c r g (U k + 1 ) - c\{tf + 1 ), by Lemma 5-15 

< 2p(P-(1 + p)(/5 + 8 + e)) + 2(1 + p)(e + p(4fi + a + 5c)) 
+ (1 -pJjSj + 2p(2P + /* + £ + e), by Lemma 5-14 

<0, by inequality (2). I 

This ji is approximately 6e, which is slightly larger than the smallest one maintainable, 4c. To 
shrink it back down, P can be made slightly smaller than required by the maintenance algorithm, 
as long as the lower bound of inequality (3) isn't violated. Since the synchronization procedure is 
performed more often, the clocks don't drift apart as much, and consequently, mey can be more 
closely synchronized. Once the desired /} is reached, P can be increased again. (The 
computational costs associated with performing the synchronization procedure and the possible 
degradation of validity may make it advisable to resychronize more infrequently.) 



5.6 Using Only the Start-up Algorithm 

A natural idea is to use Algorithm 5- 1 solely, and never switch to the mainenance algorithm. Both 
algorithms can synchronize clocks to within approximately 4c, so such a policy would sacrifice 
very little in accuracy. Using just the one algorithm is conceptually simpler and avoids 
introducing the additional error during the switch-over. However, if toe system does no work 
dunng4he period of tone when processes have clocks with different indices, it a important to 
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minimize this interval. Algorithm 5-1 has such an interval of length 5 + 3e; for Algorithm 4-1, it is 
approximately j8 + 2p(/t + 5 + e). Depending on the choice of values for the parameters, 
Algorithm 4-1 may be superior in this regard. 
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Chapter Six 
Conclusion 



6.1 Summary 

In conclusion, we have presented a precise formal model to describe a system of distributed 
processes, each of which has its own clock. Within this model we proved a lower bound on how 
closely clocks can be synchronized even under strong simplifying assumptions. 

The major part of the thesis was the description and analysis of an algorithm to synchronize the 
clocks of a completely connected network in the presence of dock drift, uncertainty in the 
message delivery time, and Byzantine process faults. Since it does not use digital signatures, the 
algorithm requires that more than two thirds of the processes be nonfaulty. Our algorithm is an 
improvement over those in [7] based on Byzantine Agreement protocols in that the number of 
messages per round is n 2 instead of exponential, and that the size of the adjustment made at each 
round is a small amount independent of the number of faults. 

The algorithm in [5] works for a more general communication network, and, since it uses digital 
signatures, only requires that more than half the processes be nonfaulty. However, the size of the 
adjustment depends on the number of faulty processes. 

The issue of which algorithm synchronizes the the most closely is difficult to resolve because of 
differing assumptions about the underlying model. For instance, Algorithm 4-1 of this thesis can 
achieve a closeness of synchronization of approximately 4e in our notation. However, we assume 
that local processing time is negligible; otherwise Lamport [8] claims that actually there Is an 
implicit factor of n in the e, in which case the closeness of synchronization achieved by our 
algorithm depends on the number of processes as do those in [7]. 

We also modified Algorithm 4-1 to produce an algorithm to establish synchronization initially 
among clocks with arbitrary values. This algorithm also handles clock drift, uncertainty m the 
message delivery time, and Byzantine process faults. This problem, as far as we know, had not 
been addressed previously for real-time clocks. 



6.2 Open Questions 

It would be interesting to know more lower bounds on the closeness of synchronization 
achievable. For example, a question posed by J. Halpem is to determine a lower bound when the 
communication network has an arbitrary configuration and the uncertainty in the message 
delivery time is different for each link. 

There are also no known lower bounds for the case of clock drift and faulty processes. 

The validity of algorithm 5-1 has not been computed. If this algorithm were used solely, knowing 
how the processes' clocks increase in relation to real time would be of interest. Lower bounds in 
general for the validity conditions are not known. 

It seems reasonable that there is a tradeoff between the closeness of synchronization and the 
validity, since the synchronization procedure must be performed more often in order to 
synchronize more closely, but each resychronization event potentfaHy worsens the validity. This 
tradeoff has not been quantified. 

M. Fischer [4] has suggested an "asynchronous" version of Algorithm 5-1 to establish 
synchronization. In his version, a nonfaufty process wakes up at an arbitrary time with arbitrary 
values for its correction variable and array of differences. Every P as measured on its physical 
(not logical) dock, the process performs the faulMoterant averaging function and updates its 
clock. It seems that the clock values should converge, but at what rate? 

What kind of algorithms that use the fault-tolerant averaging function can be used in more general 
communication graphs? 

Another avenue of investigation is using the fault-tolerant averaging function together with the 

capability for autherrtkation to see if algoriti^ 

thesis and better accuracy than those in [5] can be designed. 
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Appendix A 
Multisets 



This Appendix consists of definitions and lemmas concerning multisets needed for the proofs of 
Lemmas 4-9 and 5-10. These definitions and lemmas are analogous to some in [1]. 

A multiset U is a finite collection of real numbers in which the same number may appear more 
than once. The largest value in U is denoted max(U), and the smallest value in U is denoted 
min(U). The diameter of U, diam(U), is max(U) - min(U). Let s(U) be the multiset obtained by 
deleting one occurrence of min(U), and Kit) be the multiset obtained by deleting one occurrence 
of max(U). If |U| > 2f + 1, we define reduced!) to be lV(U), the result of removing the f largest 
and f smallest elements of U. 

Given two multisets U and V with |U| < |V|, consider an injection c mapping U to V. For any 
nonnegative real number x, define Sjc) to be {u€U: |u - c(u)| > x}. We define the x-distance 
between U and V to be dfll.V) - min {|S v (c)|}. We say c witnesses d„(U,V) if |S „(c)| - d fU.V). 

X G X XXX 

The x-distance between U and V is the number of elements of U tiiat cannot be matched up with 
an element of V which is the same to within x. If |u - c(u)| ^ x, then we say u and c(u) are x-paired 
by c. The midpoint of U, mid(U), is 1 i[max(U) + min(U)]. 

For any multiset U and real number r, define U + r to be the multiset obtained by adding r to every 
element of U; that is, U + r ■ {u + r. u € U}. ft a obvious that mid and reduce are invariant 
under this operation. 

The next lemma bounds the diameter of a reduced multiset 

Lemma A- 1: Let U and W be multisets such that |U| » n, |W| ■ n - f, and d x (W,U) > 
0, where n^2f + 1. Then 

max(reduce(U)) :£ maxfW) + x and min(reduce(U)) ^ min(W) - x. 
Proof: We show the result for max; a similar argument holds for min. Let c witness 
d x (W,U). Suppose none of the f elements deleted from the high end of U are x-paired 
with elements of W by c. Since d x (W,U) - 0, the remaining n - f elements of U are 
x-paired with elements of W by c, and thus every element of reduce(U) is x-paired with 
an element of W. Suppose max(reduce(U)) is x-paired with w in W by c. Then 
max(reduce(U)) <> + x <; max(W) + x. 

Now suppose one of the elements deleted from the high end of U is x-paired with an 
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element of W by c. Let u be the largest such, and suppose it was paired with w in 
W. Then max(reduce(U)) < u < w + x < max(W) + x. I 

We show that the x-distance between two multisets is not increased by removing the largest (or 

smallest) element from each. 

Lemma A-2: Let U and V be multisets, each with at least one element. Then 

d x (l(U),l(V)) < d x (U,V) and d x (s(U),s(V)) < d x (U,V). 

Proof: We give the proof in detail for I; a symmetric argument holds for s. Let M » t(U) 

and N « HV) Let c witness d (U,V). We construct an injection c from M to n ana 

"oJ r M I Wl, Since d x (M,N) < |S x (c')t and |S X (C)| » d x (U,V), it follows 

thatd x (M,N)<d x (U,V). 

Suppose u - max(U) and v - max(V). (These are the deleted elements.) 

Case 1: c(u) * v. Define c'(m) - c(m) for all m in M. Obviously C is an injection. 
|S x (c')| < |S x (c)| since either S X (C) = S x (c) or S X (C) ■ S x (c) - {u}. 

Case2: c(u) * v and there is no u' in U such that c(u') - v. This is the same as Case 
1. ■ ■ 

Case 3: c(u) * v, and there is u' in U such that c(u') - v. Suppose c(u) - v*. Define 
c'(u') - V and c'(m) - c(m) for aH m in M besides V. Obviously C is an injection. Now 
we show that |S x (c')| ^ |S x (c)|. 

If u or u' or both are in S x (c) then whether or not uT is in S X (C) the inequality holds. The 
only trouble arises if u and u' are both not in S x (c) but u' is in S x (C). Suppose that is 
the case. Then |V - c'tu')! - |u' - v*| > x. There are two possibilities: 

(l)u'>V + x. SinceuisnotinS x (c),|u-c(u)| - |u-V|£x. SoV>u-x. J*"""" 
v' + x^u-x + x, which implies that u'>u. But this contradicts u being the largest 

element of U. 

(H) v' > u' ♦ x. Sinceu'isnotinS x (c),|u'-c(u')| - |u'-v|<x. ^ u '£ v "* "*** 
v->u' + x>v-x + x, which implies that V>v. But this contradids v being the 
largest element of V. 



The next lemma shows that the results of reducing two multisets, each of whose x-distance from a 

third multiset is 0, can't contain values that are too far apart 

Lemma A-3: Let U, V, and W be multisets such that |U| - |V| -n and |W| - n - f, 
where n>3f. If d x (W,U) - Oandd x (W,V) - 0, then 

min(reduce(U)) - max(reduce<V)) £ 2x. ^^ 

Proof: First we show that d <U,V) < f. Let % witness ^WIO^V^TJ 
d (W,V). Define an Injection clrom U to V as follows: H there «w in W such that cjw) 
-u, then let c(u) - c^w); otherwise, let c(u) be any unused element of V. For each of 
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the n - f elements w in W, there is u in U such that u = c^w). Thus |u - c(u)| < |u i - w| 
+ |w - c(u)| - \c u M - w| + |w - c^w)! < x + x - 2x. Thus S^c) < f, so d^U.V) £ 
f. 

Then by applying Lemma A-2 f times, we know that d 2x (reduce(U),reduce(V)) < I 
Since |reduce(U)| = jreduce(V)l » n - 2f > f, there are u in reduce<U) and i v in 
reduce(V) such that |u - v| < 2x. Thus min(reduce(U)) - max(reduce(V)) < u - v < 2x. 
I 

Lemma A-4 is the main multiset result. It bounds the difference between the midpoints of twor 

reduced multisets in terms of a particular third multiset 

Lemma A-4: Let U, V, and W be multisets such that |U| = |V| * n and |W| - n - f, 
where n > 3f . If d x (W,U) * and d x (W, V) » 0, then 

|mid(reduce(U)) - mid(reduce(V))l < ^diam{W) + 2x. 
Proof: |mid(reduce(U)) - mid(reduce(V))| 

- »>4|max(reduce(U)) + min(reduce(U))-max(reduce(V))-min(reduce(V))l 

» i/4|max(reduce(U))-min(reduce(V)) + min(reduce(U)) - max(reduce{V))l 

If the quantity inside the absolute value signs is nonnegative, this expression is equal 
to 

V4[max(reduce(U))-min(reduce(V)) + min(reduce(U)) - max(reduce(V))] 

< V4(max(W) + x-(min(W)-x) + min(reduce(U)) - max(reduce(V))), by applying 
Lemma A- 1 twice 

- V4(diam(W) + 2x + min(reduce(U)) - max(reduce(V))) 

< '/4(diam(W) + 2x + 2x), by Lemma A-3 
■ ttdiam{W) + 2x. 

If the quantity inside the absolute value is nonpositive, then symmetric reasoning gives 
the result. I 
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